Performance Analysis of Diabetes Detection Using Machine Learning Classifiers
Main Article Content
Abstract
Diabetes is a chronic medical condition that has been causing severe public health challenges in not only Canada, but the entire world, for as long as time immemorial, impacting millions of people and putting pressure on healthcare resources. That said, conventional diagnostic procedures sometimes depend on few data points and are prone to mistakes, resulting in premature action. Additionally, the sluggish adoption of modern machine learning (ML) technologies in the healthcare industries might be due to their misunderstanding of the systems’ decision making procedures. This study purports to fill that gap by looking at various machine learning (ML) algorithms and applying them on the PIMA Indians Diabetes Dataset provided by the National Health Institute of Diabetes and Digestive and Kidney Diseases with the aim of improving the validity of diabetes prediction and diagnosis. Three types of machine learning classifiers are used: Tree-based, Function-based, and Rule-based. Results have shown that Stochastic Gradient Descent (function), Logistic Regression (function), JRip (rules) and Random Forests (trees) are among the top performing classifiers. They are judged based on different metrics, such as accuracy, precision, recall, specificity, F-1 score, MCC, and ROC area. Despite performing well in almost all of the metrics, SGD’s low recall score shows that it is not the most optimal algorithm. Given that recall score is prioritized in the context of clinical diagnostics, Random Forest emerges as a strong candidate due to its balanced performance across key metrics.
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.
How to Cite
References
Agliata, A., Giordano, D., Bardozzo, F., Bottiglieri, S., Facchiano, A., & Tagliaferri, R. (2023). Machine learning as a support for the diagnosis of Type 2 diabetes. International Journal of Molecular Sciences, 24(7), 6775. https://doi.org/10.3390/ijms24076775
Ahmed, T. M. (2016). Using data mining to develop models for classifying diabetic patient control level based on historical medical records. Journal of Theoretical and applied information Technology, 87(2), 316.
Alam, T. M., Iqbal, M. A., Ali, Y., Wahab, A., Ijaz, S., Baig, T. I., Hussain, A., Malik, M. A., Raza, M. M., Ibrar, S., & Abbas, Z. (2019). A model for early prediction of diabetes. Informatics in Medicine Unlocked, 16, 100204. https://doi.org/10.1016/j.imu.2019.100204
Allen, A., Iqbal, Z., Green-Saxena, A., Hurtado, M., Hoffman, J., Mao, Q., & Das, R. (2022). Prediction of diabetic kidney disease with machine learning algorithms, upon the initial diagnosis of type 2 diabetes mellitus. BMJ Open Diabetes Research & Care, 10(1), e002560. https://doi.org/10.1136/bmjdrc-2021-002560
American Diabetes Association. (n.d.). Diabetes Diagnosis & Tests. https://diabetes.org/about-diabetes/diagnosis
Baadel, S., Thabtah, F., Lu, J. (2020). A clustering approach for Autistic trait classification. Informatics for Health and Social Care. 45 (3), 309-326.
Berge, G. T., Granmo, O., Tveit, T. O., Ruthjersen, A. L., & Sharma, J. (2023). Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records. BMC Medical Informatics and Decision Making, 23(1). https://doi.org/10.1186/s12911-023-02271-8
Bhat, S. S., Banu, M., Ansari, G. A., & Selvam, V. (2023). A risk assessment and prediction framework for diabetes mellitus using machine learning algorithms. Healthcare Analytics, 4, 100273. https://doi.org/10.1016/j.health.2023.100273
Chang, V., Bailey, J., Xu, Q. A., & Sun, Z. (2022). Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms. Neural Computing and Applications, 35(22), 16157–16173. https://doi.org/10.1007/s00521-022-07049-z
Chowdhury, M. M., Ayon, R. S., & Hossain, M. S. (2024). An investigation of machine learning algorithms and data augmentation techniques for diabetes diagnosis using class imbalanced BRFSS dataset. Healthcare Analytics, 5, 100297. https://doi.org/10.1016/j.health.2023.100297
Daley S & Yashi K. (2023). Obesity and Type 2 Diabetes. https://www.ncbi.nlm.nih.gov/books/NBK592412/
Das, D., Biswas, S. K., & Bandyopadhyay, S. (2022). Detection of Diabetic Retinopathy using Convolutional Neural Networks for Feature Extraction and Classification (DRFEC). Multimedia Tools and Applications, 82(19), 29943–30001. https://doi.org/10.1007/s11042-022-14165-4
Davenport, T., & Kalakota, R. (2019). The potential for artificial intelligence in healthcare. Future Healthcare Journal, 6(2), 94–98. https://doi.org/10.7861/futurehosp.6-2-94
Dudkina, T., Meniailov, I., Bazilevych, K., Krivtsov, S., & Tkachenko, A. (2021). Classification and Prediction of Diabetes Disease using Decision Tree Method. Symposium on Information Technologies & Applied Sciences, 163–172. http://ceur-ws.org/Vol-2824/paper16.pdf
Fjellström, C., & Nyström, K. (2022). Deep learning, stochastic gradient descent and diffusion maps. Journal of Computational Mathematics and Data Science, 4, 100054. https://doi.org/10.1016/j.jcmds.2022.100054
Flores, Y. N., Toth, S., Crespi, C. M., Ramírez-Palacios, P., McCarthy, W. J., Briseño-Pérez, A., Granados-García, V., & Salmerón, J. (2020). Risk of developing pre-diabetes or diabetes over time in a cohort of Mexican health workers. PLoS ONE, 15(3), e0229403. https://doi.org/10.1371/journal.pone.0229403
Ganz, M. L., Wintfeld, N., Li, Q., Alas, V., Langer, J., & Hammer, M. (2014). The association of body mass index with the risk of type 2 diabetes: a case-control study nested in an electronic health records system in the United States. Diabetology & metabolic syndrome, 6(1), 50. https://doi.org/10.1186/1758-5996-6-50
Government of Canada. (2023, October 23). Care during pregnancy: Family-centred maternity and newborn care national guidelines. Canada.ca. https://www.canada.ca/en/public-health/services/publications/healthy-living/maternity-newborn-care-guidelines-chapter-3.html
Grabler, P., Sighoko, D., Wang, L., Allgood, K., & Ansell, D. (2017). Recall and cancer detection rates for screening mammography: finding the sweet spot. American Journal of Roentgenology, 208(1), 208–213. https://doi.org/10.2214/ajr.15.15987
Gurung, P., Zubair, M., & Jialal, I. (2024, February 27). Plasma glucose. StatPearls - NCBI Bookshelf. https://www.ncbi.nlm.nih.gov/books/NBK541081/
Haripriya, G., Abinaya, K., Aarthi, N., & Kumar, P. (2021). Random Forest Algorithms in Health Care Sectors: A Review of Applications.
Hounguè, P., & Bigirimana, A. G. (2022). Leveraging PIMA Dataset to diabetes Prediction: case study of deep neural networks. Journal of Computer and Communications, 10(11), 15–28. https://doi.org/10.4236/jcc.2022.1011002
Iparraguirre-Villanueva, O., Espinola-Linares, K., Castañeda, R. O. F., & Cabanillas-Carbonell, M. (2023). Application of machine learning models for early detection and accurate classification of Type 2 diabetes. Diagnostics, 13(14), 2383. https://doi.org/10.3390/diagnostics13142383
Jadhav, A., Mostafa, S. M. M., Elmannai, H., & Karim, F. K. (2022). An empirical assessment of performance of data balancing techniques in classification task. Applied Sciences, 12(8), 3928. https://doi.org/10.3390/app12083928
Javaid, M., Haleem, A., Singh, R. P., Suman, R., & Rab, S. (2022). Significance of machine learning in healthcare: Features, pillars and applications. International Journal of Intelligent Networks, 3, 58–73. https://doi.org/10.1016/j.ijin.2022.05.002
Jonnalagadda, S., Cohen, T., Wu, S., & Gonzalez, G. (2012). Enhancing clinical concept extraction with distributional semantics. Journal of Biomedical Informatics, 45(1), 129–140. https://doi.org/10.1016/j.jbi.2011.10.007
Kolasa, K., Admassu, B., Hołownia-Voloskova, M., Kędzior, K. J., Poirrier, J. E., & Perni, S. (2024). Systematic reviews of machine learning in healthcare: a literature review. Expert review of pharmacoeconomics & outcomes research, 24(1), 63–115. https://doi.org/10.1080/14737167.2023.2279107
Mall, P. K., Singh, P. K., Srivastav, S., Narayan, V., Paprzycki, M., Jaworska, T., & Ganzha, M. (2023). A comprehensive review of deep neural networks for medical image processing: Recent developments and future opportunities. Healthcare Analytics, 4, 100216. https://doi.org/10.1016/j.health.2023.100216
Mordarska, K., & Godziejewska-Zawada, M. (2017). Diabetes in the elderly. Menopause Review/Przegląd Menopauzalny, 16(2), 38-43. https://doi.org/10.5114/pm.2017.68589
Mousa, A., Mustafa, W., Marqas, R. B., & Mohammed, S. H. M. (2023). A comparative study of diabetes detection using the PIMA Indian Diabetes Database. The Journal of the University of Duhok, 26(2), 277–288. https://doi.org/10.26682/sjuod.2023.26.2.24
Nguyen, L. P., Tung, D. D., Nguyen, D. T., Le, H. N., Tran, T. Q., Van Binh, T., & Pham, D. T. N. (2023). The utilization of machine learning algorithms for assisting physicians in the diagnosis of diabetes. Diagnostics, 13(12), 2087. https://doi.org/10.3390/diagnostics13122087
Pal, S., Mishra, N., Bhushan, M., Kholiya, P. S., Rana, M., & Negi, A. (2022). Deep learning techniques for prediction and diagnosis of diabetes mellitus. 2022 International Mobile and Embedded Technology Conference (MECON). https://doi.org/10.1109/mecon53876.2022.9752176
Patel, H. (2024, April 29). Feature Engineering explained. Built In. https://builtin.com/articles/feature-engineering#:~:text=Apr%2029%2C%202024-,Feature%20engineering%20is%20the%20process%20of%20selecting%2C%20manipulating%20and%20transforming,used%20in%20a%20predictive%20model.
Salazar-Reyna, R., Gonzalez-Aleu, F., Granda-Gutierrez, E. M., Diaz-Ramirez, J., Garza-Reyes, J. A., & Kumar, A. (2020). A systematic literature review of data science, data analytics and machine learning applied to healthcare engineering systems. Management Decision, 60(2), 300–319. https://doi.org/10.1108/md-01-2020-0035
Shetty, D., Rit, K., Shaikh, S., & Patil, N. (2017). Diabetes disease prediction using data mining. 2017 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS). https://doi.org/10.1109/iciiecs.2017.8276012
Siddique, S., & Chow, J. C. L. (2021). Machine learning in healthcare communication. Encyclopedia, 1(1), 220–239. https://doi.org/10.3390/encyclopedia1010021
Simaiya, S., Kaur, R., Sandhu, J. K., Alsafyani, M., Alroobaea, R., Alsekait, D. M., Margala, M., & Chakrabarti, P. (2022). A novel multistage ensemble approach for prediction and classification of diabetes. Frontiers in Physiology, 13. https://doi.org/10.3389/fphys.2022.1085240
Sonia, J. J., Jayachandran, P., Quadir, A., MD, Mohan, S., Sivaraman, A. K., & Tee, K. F. (2023). Machine-Learning-Based diabetes mellitus risk prediction using Multi-Layer Neural Network No-PROP algorithm. Diagnostics, 13(4), 723. https://doi.org/10.3390/diagnostics13040723
Wilcox G. (2005). Insulin and insulin resistance. The Clinical biochemist. Reviews, 26(2), 19–39.
Xu, K., Feng, D., & Mi, H. (2017). Deep Convolutional Neural Network-Based Early Automated detection of diabetic retinopathy using FundUs Image. Molecules, 22(12), 2054. https://doi.org/10.3390/molecules22122054