Performance of Machine Learning Classifiers for Diabetes Prediction

Main Article Content

Mijala Manandhar
Shaikat Baidya
Babalpreet Kaur
Katia Atoji

Abstract

In this study, machine learning (ML) classifiers were evaluated for their effectiveness in predicting diabetes using the Pima Indians Diabetes Database. The dataset included 768 instances with nine attributes, where the target variable indicated whether a patient tested positive for diabetes. The classifiers were grouped into Function (Logistic Regression, Multilayer Perceptron, Stochastic Gradient Descent), Rules (Decision Table, JRip, OneR), and Trees (Decision Stump, Hoeffding Tree, J48). Performance metrics such as accuracy, precision, recall, Matthews Correlation Coefficient, ROC Area, and F1-measure were used to compare the classifiers. Among the Function classifiers, Stochastic Gradient Descent (SGD) demonstrated the highest performance, particularly in handling large datasets and minimizing overfitting. Logistic Regression and Multilayer Perceptron also showed robust results, but SGD was superior in most metrics. For the Rules classifiers, JRip outperformed others due to its iterative rule optimization, whereas OneR's simplicity resulted in the lowest performance. Decision Table offered a clear representation of decision rules but was limited by the complexity of the dataset. In the Trees group, J48 was the most effective, benefitting from its ability to handle complex interactions and numerous features. The study highlights the potential of ML algorithms in early diabetes detection, enabling timely intervention and personalized management strategies. The importance of key predictors such as plasma glucose, BMI, and age was emphasized. Future research should focus on integrating multiple datasets and exploring more complex ML algorithms to enhance prediction accuracy and generalization. The development of real-time predictive systems is crucial for improving clinical processes and patient outcomes. 

Article Details

How to Cite
Performance of Machine Learning Classifiers for Diabetes Prediction. (2024). International Journal of Management and Data Analytics, 4(1), 1-8. https://ijmada.com/index.php/ijmada/article/view/39
Section
Regular Paper

How to Cite

Performance of Machine Learning Classifiers for Diabetes Prediction. (2024). International Journal of Management and Data Analytics, 4(1), 1-8. https://ijmada.com/index.php/ijmada/article/view/39

References

Adlung, L., Cohen, Y., Mor, U., & Elinav, E. (2021). Machine learning in clinical decision making. Med, 2(6), 642–665. https://doi.org/10.1016/j.medj.2021.04.006

Alghamdi, T. (2023). Prediction of diabetes complications using computational intelligence techniques. Applied Sciences, 13(5), 3030. https://doi.org/10.3390/app13053030

Badawy, M., Ramadan, N. and Hefny, H.A. (2023) Healthcare predictive analytics using machine learning and Deep Learning Techniques: A Survey - Journal of Electrical Systems and Information Technology, SpringerOpen. Available at: https://doi.org/10.1186/s43067-023-00108-y

Bahri, M., Bifet, A., Gama, J., Gomes, H. M., & Maniu, S. (2021). Data stream analysis: Foundations, major tasks and tools. Wiley Interdisciplinary Reviews. Data Mining and Knowledge Discovery/Wiley Interdisciplinary Reviews. Data Mining and Knowledge Discovery, 11(3). https://doi.org/10.1002/widm.1405

Bopche, R., & Damås, J. K. (2024). Recent Advancements in Machine Learning-Based Bloodstream Infection Prediction: A Systematic Review and Meta-analysis of Diagnostic Test Accuracy. medRxiv. https://doi.org/10.1101/2024.04.15.24305877

Bundi, D.N. (2024), "Adoption of machine learning systems within the Health Sector: A health sector: a systematic review, synthesis and research agenda (2023)", Digital Transformation and Society. Available at:, 3(1), 99. https://doi.org/10.1108/DTS-06-2023-0041

Byeon, H. (2022). Factors influencing the utilization of diabetes complication tests under the COVID-19 Pandemic: Machine Learning approach. Frontiers in Endocrinology, 13. https://doi.org/10.3389/fendo.2022.925844

Chou, C. Y., Hsu, D. Y., & Chou, C. H. (2023). Predicting the onset of diabetes with machine learning methods. Journal of Personalized Medicine, 13(3), 406. https://doi.org/10.3390/jpm13030406

Dritsas, E., & Trigka, M. (2022). Data-driven machine-learning methods for diabetes risk prediction. Sensors, 22(14), 5304. https://doi.org/10.3390/s22145304

Ellahham, S. (2020). Artificial Intelligence: The Future for Diabetes Care. the American Journal of Medicine, 133(8), 895–900. https://doi.org/10.1016/j.amjmed.2020.03.033

Geeks for Geeks. (2023, December 21). What is Feature Engineering? GeeksforGeeks. https://www.geeksforgeeks.org/what-is-feature-engineering/

Ghazal, T. M., Hasan, M. K., Alshurideh, M. T., Alzoubi, H. M., Ahmad, M., Akbar, S. S., Kurdi, B. A., & Akour, I. A. (2021). IoT for Smart Cities: Machine Learning Approaches in Smart Healthcare—A Review. Future Internet, 13(8), 218. https://doi.org/10.3390/fi13080218

Iparraguirre-Villanueva, O., Espinola-Linares, K., Flores Castañeda, R. O., & Cabanillas-Carbonell, M. (2023). Application of machine learning models for early detection and accurate classification of type 2 diabetes. Diagnostics, 13(14), 2383. https://doi.org/10.3390/diagnostics13142383

Levy, J.J., O’Malley, A.J. (2020). Don’t dismiss logistic regression: the case for sensible extraction of interactions in the era of machine learning. BMC Med Res Methodol 20(171). https://doi.org/10.1186/s12874-020-01046-3

Liu, J., Sun, Y., Gan, W., Xu, X., Wohlberg, B., & Kamilov, U. S. (2021). SGD-NET: Efficient Model-Based Deep Learning with theoretical Guarantees. IEEE Transactions on Computational Imaging, 7, 598–610. https://doi.org/10.1109/tci.2021.3085534

Mauricio, D., Alonso, N., & Gratacòs, M. (2020). Chronic Diabetes Complications: The Need to Move beyond Classical Concepts. Trends in Endocrinology and Metabolism, 31(4), 287–295. https://doi.org/10.1016/j.tem.2020.01.007

Panigutti, C., Beretta, A., Fadda, D., Giannotti, F., Pedreschi, D., Perotti, A., & Rinzivillo, S. (2023). Co-design of human-centered, explainable AI for clinical decision support. ACM Transactions on Interactive Intelligent Systems, 13(4), 1-35. https://dl.acm.org/doi/full/10.1145/3587271

Poleto, T., Nepomuceno, T. C. C., De Carvalho, V. D. H., Friaes, L. C. B. D. O., De Oliveira, R. C. P., & Figueiredo, C. J. J. (2023). Information Security Applications in Smart Cities: A Bibliometric Analysis of Emerging Research. Future Internet, 15(12), 393. https://doi.org/10.3390/fi15120393

Sakly, H., Said, M., Al-Sayed, A. A., Loussaief, C., Sakly, R., & Seekins, J. (2023). Blockchain technologies for internet of medical things (BIoMT) based healthcare systems: a new paradigm for COVID-19 pandemic. In Trends of Artificial Intelligence and Big Data for E-Health (pp. 139-165). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-11199-0_8

Sonia, J. J., Jayachandran, P., Md, A. Q., Mohan, S., Sivaraman, A. K., & Tee, K. F. (2023). Machine-learning-based diabetes mellitus risk prediction using multi-layer neural network no-prop algorithm. Diagnostics, 13(4), 723. https://doi.org/10.3390/diagnostics13040723

Wendimu, D., & Biredagn, K. (2022). Developing a knowledge-based system for diagnosis and treatment recommendation of neonatal diseases. Cogent Engineering, 10(1). https://doi.org/10.1080/23311916.2022.2153567

Yadu, S., Chandra, R., & Sinha, V. K. (2024). Comparing different machine learning techniques in predicting diabetes on early stage. Engineering Proceedings, 62(1), 20. https://doi.org/10.3390/engproc2024062020

Zhang, X., He, D., Zheng, Y., Huo, H., Li, S., Chai, R., & Liu, T. (2020). Deep learning based analysis of breast cancer using advanced ensemble classifier and linear discriminant analysis. IEEE Access, 8, 120208–120217. https://doi.org/10.1109/access.2020.3005228

Vidiyala, R. (2022, August 9). Performance Metrics for Classification Machine Learning Problems. Medium. https://towardsdatascience.com/performance-metrics-for-classification-machine-learning-problems-97e7e774a007

Waikato Environment for Knowledge Analysis v. 2.8.6 (2022). WEKA The Workbench for Machine Learning. The University of Waikato, Hamilton, New Zealand. https://waikato.github.io/weka-site/index.html