Performance Analysis of Diabetes Detection Using Machine Learning Classifiers

Main Article Content

Hung Vu Trung Huynh
Liu Hui
Ngoc Han Nguyen
Ruixuan Qiao

Abstract

Diabetes is a chronic medical condition that has been causing severe public health challenges in not only Canada, but the entire world, for as long as time immemorial, impacting millions of people and putting pressure on healthcare resources. That said, conventional diagnostic procedures sometimes depend on few data points and are prone to mistakes, resulting in premature action. Additionally, the sluggish adoption of modern machine learning (ML) technologies in the healthcare industries might be due to their misunderstanding of the systems’ decision making procedures.  This study purports to fill that gap by looking at various machine learning (ML) algorithms and applying them on the PIMA Indians Diabetes Dataset provided by the National Health Institute of Diabetes and Digestive and Kidney Diseases with the aim of improving the validity of diabetes prediction and diagnosis. Three types of machine learning classifiers are used: Tree-based, Function-based, and Rule-based. Results have shown that Stochastic Gradient Descent (function), Logistic Regression (function), JRip (rules) and Random Forests (trees) are among the top performing classifiers. They are judged based on different metrics, such as accuracy, precision, recall, specificity, F-1 score, MCC, and ROC area. Despite performing well in almost all of the metrics, SGD’s low recall score shows that it is not the most optimal algorithm. Given that recall score is prioritized in the context of clinical diagnostics, Random Forest emerges as a strong candidate due to its balanced performance across key metrics.

Article Details

How to Cite
Performance Analysis of Diabetes Detection Using Machine Learning Classifiers. (2024). International Journal of Management and Data Analytics, 4(1), 43-54. https://ijmada.com/index.php/ijmada/article/view/50
Section
Student Paper

How to Cite

Performance Analysis of Diabetes Detection Using Machine Learning Classifiers. (2024). International Journal of Management and Data Analytics, 4(1), 43-54. https://ijmada.com/index.php/ijmada/article/view/50

References

Agliata, A., Giordano, D., Bardozzo, F., Bottiglieri, S., Facchiano, A., & Tagliaferri, R. (2023). Machine learning as a support for the diagnosis of Type 2 diabetes. International Journal of Molecular Sciences, 24(7), 6775. https://doi.org/10.3390/ijms24076775

Ahmed, T. M. (2016). Using data mining to develop models for classifying diabetic patient control level based on historical medical records. Journal of Theoretical and applied information Technology, 87(2), 316.

Alam, T. M., Iqbal, M. A., Ali, Y., Wahab, A., Ijaz, S., Baig, T. I., Hussain, A., Malik, M. A., Raza, M. M., Ibrar, S., & Abbas, Z. (2019). A model for early prediction of diabetes. Informatics in Medicine Unlocked, 16, 100204. https://doi.org/10.1016/j.imu.2019.100204

Allen, A., Iqbal, Z., Green-Saxena, A., Hurtado, M., Hoffman, J., Mao, Q., & Das, R. (2022). Prediction of diabetic kidney disease with machine learning algorithms, upon the initial diagnosis of type 2 diabetes mellitus. BMJ Open Diabetes Research & Care, 10(1), e002560. https://doi.org/10.1136/bmjdrc-2021-002560

American Diabetes Association. (n.d.). Diabetes Diagnosis & Tests. https://diabetes.org/about-diabetes/diagnosis

Baadel, S., Thabtah, F., Lu, J. (2020). A clustering approach for Autistic trait classification. Informatics for Health and Social Care. 45 (3), 309-326.

Berge, G. T., Granmo, O., Tveit, T. O., Ruthjersen, A. L., & Sharma, J. (2023). Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records. BMC Medical Informatics and Decision Making, 23(1). https://doi.org/10.1186/s12911-023-02271-8

Bhat, S. S., Banu, M., Ansari, G. A., & Selvam, V. (2023). A risk assessment and prediction framework for diabetes mellitus using machine learning algorithms. Healthcare Analytics, 4, 100273. https://doi.org/10.1016/j.health.2023.100273

Chang, V., Bailey, J., Xu, Q. A., & Sun, Z. (2022). Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms. Neural Computing and Applications, 35(22), 16157–16173. https://doi.org/10.1007/s00521-022-07049-z

Chowdhury, M. M., Ayon, R. S., & Hossain, M. S. (2024). An investigation of machine learning algorithms and data augmentation techniques for diabetes diagnosis using class imbalanced BRFSS dataset. Healthcare Analytics, 5, 100297. https://doi.org/10.1016/j.health.2023.100297

Daley S & Yashi K. (2023). Obesity and Type 2 Diabetes. https://www.ncbi.nlm.nih.gov/books/NBK592412/

Das, D., Biswas, S. K., & Bandyopadhyay, S. (2022). Detection of Diabetic Retinopathy using Convolutional Neural Networks for Feature Extraction and Classification (DRFEC). Multimedia Tools and Applications, 82(19), 29943–30001. https://doi.org/10.1007/s11042-022-14165-4

Davenport, T., & Kalakota, R. (2019). The potential for artificial intelligence in healthcare. Future Healthcare Journal, 6(2), 94–98. https://doi.org/10.7861/futurehosp.6-2-94

Dudkina, T., Meniailov, I., Bazilevych, K., Krivtsov, S., & Tkachenko, A. (2021). Classification and Prediction of Diabetes Disease using Decision Tree Method. Symposium on Information Technologies & Applied Sciences, 163–172. http://ceur-ws.org/Vol-2824/paper16.pdf

Fjellström, C., & Nyström, K. (2022). Deep learning, stochastic gradient descent and diffusion maps. Journal of Computational Mathematics and Data Science, 4, 100054. https://doi.org/10.1016/j.jcmds.2022.100054

Flores, Y. N., Toth, S., Crespi, C. M., Ramírez-Palacios, P., McCarthy, W. J., Briseño-Pérez, A., Granados-García, V., & Salmerón, J. (2020). Risk of developing pre-diabetes or diabetes over time in a cohort of Mexican health workers. PLoS ONE, 15(3), e0229403. https://doi.org/10.1371/journal.pone.0229403

Ganz, M. L., Wintfeld, N., Li, Q., Alas, V., Langer, J., & Hammer, M. (2014). The association of body mass index with the risk of type 2 diabetes: a case-control study nested in an electronic health records system in the United States. Diabetology & metabolic syndrome, 6(1), 50. https://doi.org/10.1186/1758-5996-6-50

Government of Canada. (2023, October 23). Care during pregnancy: Family-centred maternity and newborn care national guidelines. Canada.ca. https://www.canada.ca/en/public-health/services/publications/healthy-living/maternity-newborn-care-guidelines-chapter-3.html

Grabler, P., Sighoko, D., Wang, L., Allgood, K., & Ansell, D. (2017). Recall and cancer detection rates for screening mammography: finding the sweet spot. American Journal of Roentgenology, 208(1), 208–213. https://doi.org/10.2214/ajr.15.15987

Gurung, P., Zubair, M., & Jialal, I. (2024, February 27). Plasma glucose. StatPearls - NCBI Bookshelf. https://www.ncbi.nlm.nih.gov/books/NBK541081/

Haripriya, G., Abinaya, K., Aarthi, N., & Kumar, P. (2021). Random Forest Algorithms in Health Care Sectors: A Review of Applications.

Hounguè, P., & Bigirimana, A. G. (2022). Leveraging PIMA Dataset to diabetes Prediction: case study of deep neural networks. Journal of Computer and Communications, 10(11), 15–28. https://doi.org/10.4236/jcc.2022.1011002

Iparraguirre-Villanueva, O., Espinola-Linares, K., Castañeda, R. O. F., & Cabanillas-Carbonell, M. (2023). Application of machine learning models for early detection and accurate classification of Type 2 diabetes. Diagnostics, 13(14), 2383. https://doi.org/10.3390/diagnostics13142383

Jadhav, A., Mostafa, S. M. M., Elmannai, H., & Karim, F. K. (2022). An empirical assessment of performance of data balancing techniques in classification task. Applied Sciences, 12(8), 3928. https://doi.org/10.3390/app12083928

Javaid, M., Haleem, A., Singh, R. P., Suman, R., & Rab, S. (2022). Significance of machine learning in healthcare: Features, pillars and applications. International Journal of Intelligent Networks, 3, 58–73. https://doi.org/10.1016/j.ijin.2022.05.002

Jonnalagadda, S., Cohen, T., Wu, S., & Gonzalez, G. (2012). Enhancing clinical concept extraction with distributional semantics. Journal of Biomedical Informatics, 45(1), 129–140. https://doi.org/10.1016/j.jbi.2011.10.007

Kolasa, K., Admassu, B., Hołownia-Voloskova, M., Kędzior, K. J., Poirrier, J. E., & Perni, S. (2024). Systematic reviews of machine learning in healthcare: a literature review. Expert review of pharmacoeconomics & outcomes research, 24(1), 63–115. https://doi.org/10.1080/14737167.2023.2279107

Mall, P. K., Singh, P. K., Srivastav, S., Narayan, V., Paprzycki, M., Jaworska, T., & Ganzha, M. (2023). A comprehensive review of deep neural networks for medical image processing: Recent developments and future opportunities. Healthcare Analytics, 4, 100216. https://doi.org/10.1016/j.health.2023.100216

Mordarska, K., & Godziejewska-Zawada, M. (2017). Diabetes in the elderly. Menopause Review/Przegląd Menopauzalny, 16(2), 38-43. https://doi.org/10.5114/pm.2017.68589

Mousa, A., Mustafa, W., Marqas, R. B., & Mohammed, S. H. M. (2023). A comparative study of diabetes detection using the PIMA Indian Diabetes Database. The Journal of the University of Duhok, 26(2), 277–288. https://doi.org/10.26682/sjuod.2023.26.2.24

Nguyen, L. P., Tung, D. D., Nguyen, D. T., Le, H. N., Tran, T. Q., Van Binh, T., & Pham, D. T. N. (2023). The utilization of machine learning algorithms for assisting physicians in the diagnosis of diabetes. Diagnostics, 13(12), 2087. https://doi.org/10.3390/diagnostics13122087

Pal, S., Mishra, N., Bhushan, M., Kholiya, P. S., Rana, M., & Negi, A. (2022). Deep learning techniques for prediction and diagnosis of diabetes mellitus. 2022 International Mobile and Embedded Technology Conference (MECON). https://doi.org/10.1109/mecon53876.2022.9752176

Patel, H. (2024, April 29). Feature Engineering explained. Built In. https://builtin.com/articles/feature-engineering#:~:text=Apr%2029%2C%202024-,Feature%20engineering%20is%20the%20process%20of%20selecting%2C%20manipulating%20and%20transforming,used%20in%20a%20predictive%20model.

Salazar-Reyna, R., Gonzalez-Aleu, F., Granda-Gutierrez, E. M., Diaz-Ramirez, J., Garza-Reyes, J. A., & Kumar, A. (2020). A systematic literature review of data science, data analytics and machine learning applied to healthcare engineering systems. Management Decision, 60(2), 300–319. https://doi.org/10.1108/md-01-2020-0035

Shetty, D., Rit, K., Shaikh, S., & Patil, N. (2017). Diabetes disease prediction using data mining. 2017 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS). https://doi.org/10.1109/iciiecs.2017.8276012

Siddique, S., & Chow, J. C. L. (2021). Machine learning in healthcare communication. Encyclopedia, 1(1), 220–239. https://doi.org/10.3390/encyclopedia1010021

Simaiya, S., Kaur, R., Sandhu, J. K., Alsafyani, M., Alroobaea, R., Alsekait, D. M., Margala, M., & Chakrabarti, P. (2022). A novel multistage ensemble approach for prediction and classification of diabetes. Frontiers in Physiology, 13. https://doi.org/10.3389/fphys.2022.1085240

Sonia, J. J., Jayachandran, P., Quadir, A., MD, Mohan, S., Sivaraman, A. K., & Tee, K. F. (2023). Machine-Learning-Based diabetes mellitus risk prediction using Multi-Layer Neural Network No-PROP algorithm. Diagnostics, 13(4), 723. https://doi.org/10.3390/diagnostics13040723

Wilcox G. (2005). Insulin and insulin resistance. The Clinical biochemist. Reviews, 26(2), 19–39.

Xu, K., Feng, D., & Mi, H. (2017). Deep Convolutional Neural Network-Based Early Automated detection of diabetic retinopathy using FundUs Image. Molecules, 22(12), 2054. https://doi.org/10.3390/molecules22122054

Similar Articles

You may also start an advanced similarity search for this article.