Performance of Machine Learning Classifiers for Diabetes Prediction
Main Article Content
Abstract
In this study, machine learning (ML) classifiers were evaluated for their effectiveness in predicting diabetes using the Pima Indians Diabetes Database. The dataset included 768 instances with nine attributes, where the target variable indicated whether a patient tested positive for diabetes. The classifiers were grouped into Function (Logistic Regression, Multilayer Perceptron, Stochastic Gradient Descent), Rules (Decision Table, JRip, OneR), and Trees (Decision Stump, Hoeffding Tree, J48). Performance metrics such as accuracy, precision, recall, Matthews Correlation Coefficient, ROC Area, and F1-measure were used to compare the classifiers. Among the Function classifiers, Stochastic Gradient Descent (SGD) demonstrated the highest performance, particularly in handling large datasets and minimizing overfitting. Logistic Regression and Multilayer Perceptron also showed robust results, but SGD was superior in most metrics. For the Rules classifiers, JRip outperformed others due to its iterative rule optimization, whereas OneR's simplicity resulted in the lowest performance. Decision Table offered a clear representation of decision rules but was limited by the complexity of the dataset. In the Trees group, J48 was the most effective, benefitting from its ability to handle complex interactions and numerous features. The study highlights the potential of ML algorithms in early diabetes detection, enabling timely intervention and personalized management strategies. The importance of key predictors such as plasma glucose, BMI, and age was emphasized. Future research should focus on integrating multiple datasets and exploring more complex ML algorithms to enhance prediction accuracy and generalization. The development of real-time predictive systems is crucial for improving clinical processes and patient outcomes.
Article Details
This work is licensed under a Creative Commons Attribution 4.0 International License.
How to Cite
References
Adlung, L., Cohen, Y., Mor, U., & Elinav, E. (2021). Machine learning in clinical decision making. Med, 2(6), 642–665. https://doi.org/10.1016/j.medj.2021.04.006
Alghamdi, T. (2023). Prediction of diabetes complications using computational intelligence techniques. Applied Sciences, 13(5), 3030. https://doi.org/10.3390/app13053030
Badawy, M., Ramadan, N. and Hefny, H.A. (2023) Healthcare predictive analytics using machine learning and Deep Learning Techniques: A Survey - Journal of Electrical Systems and Information Technology, SpringerOpen. Available at: https://doi.org/10.1186/s43067-023-00108-y
Bahri, M., Bifet, A., Gama, J., Gomes, H. M., & Maniu, S. (2021). Data stream analysis: Foundations, major tasks and tools. Wiley Interdisciplinary Reviews. Data Mining and Knowledge Discovery/Wiley Interdisciplinary Reviews. Data Mining and Knowledge Discovery, 11(3). https://doi.org/10.1002/widm.1405
Bopche, R., & Damås, J. K. (2024). Recent Advancements in Machine Learning-Based Bloodstream Infection Prediction: A Systematic Review and Meta-analysis of Diagnostic Test Accuracy. medRxiv. https://doi.org/10.1101/2024.04.15.24305877
Bundi, D.N. (2024), "Adoption of machine learning systems within the Health Sector: A health sector: a systematic review, synthesis and research agenda (2023)", Digital Transformation and Society. Available at:, 3(1), 99. https://doi.org/10.1108/DTS-06-2023-0041
Byeon, H. (2022). Factors influencing the utilization of diabetes complication tests under the COVID-19 Pandemic: Machine Learning approach. Frontiers in Endocrinology, 13. https://doi.org/10.3389/fendo.2022.925844
Chou, C. Y., Hsu, D. Y., & Chou, C. H. (2023). Predicting the onset of diabetes with machine learning methods. Journal of Personalized Medicine, 13(3), 406. https://doi.org/10.3390/jpm13030406
Dritsas, E., & Trigka, M. (2022). Data-driven machine-learning methods for diabetes risk prediction. Sensors, 22(14), 5304. https://doi.org/10.3390/s22145304
Ellahham, S. (2020). Artificial Intelligence: The Future for Diabetes Care. the American Journal of Medicine, 133(8), 895–900. https://doi.org/10.1016/j.amjmed.2020.03.033
Geeks for Geeks. (2023, December 21). What is Feature Engineering? GeeksforGeeks. https://www.geeksforgeeks.org/what-is-feature-engineering/
Ghazal, T. M., Hasan, M. K., Alshurideh, M. T., Alzoubi, H. M., Ahmad, M., Akbar, S. S., Kurdi, B. A., & Akour, I. A. (2021). IoT for Smart Cities: Machine Learning Approaches in Smart Healthcare—A Review. Future Internet, 13(8), 218. https://doi.org/10.3390/fi13080218
Iparraguirre-Villanueva, O., Espinola-Linares, K., Flores Castañeda, R. O., & Cabanillas-Carbonell, M. (2023). Application of machine learning models for early detection and accurate classification of type 2 diabetes. Diagnostics, 13(14), 2383. https://doi.org/10.3390/diagnostics13142383
Levy, J.J., O’Malley, A.J. (2020). Don’t dismiss logistic regression: the case for sensible extraction of interactions in the era of machine learning. BMC Med Res Methodol 20(171). https://doi.org/10.1186/s12874-020-01046-3
Liu, J., Sun, Y., Gan, W., Xu, X., Wohlberg, B., & Kamilov, U. S. (2021). SGD-NET: Efficient Model-Based Deep Learning with theoretical Guarantees. IEEE Transactions on Computational Imaging, 7, 598–610. https://doi.org/10.1109/tci.2021.3085534
Mauricio, D., Alonso, N., & Gratacòs, M. (2020). Chronic Diabetes Complications: The Need to Move beyond Classical Concepts. Trends in Endocrinology and Metabolism, 31(4), 287–295. https://doi.org/10.1016/j.tem.2020.01.007
Panigutti, C., Beretta, A., Fadda, D., Giannotti, F., Pedreschi, D., Perotti, A., & Rinzivillo, S. (2023). Co-design of human-centered, explainable AI for clinical decision support. ACM Transactions on Interactive Intelligent Systems, 13(4), 1-35. https://dl.acm.org/doi/full/10.1145/3587271
Poleto, T., Nepomuceno, T. C. C., De Carvalho, V. D. H., Friaes, L. C. B. D. O., De Oliveira, R. C. P., & Figueiredo, C. J. J. (2023). Information Security Applications in Smart Cities: A Bibliometric Analysis of Emerging Research. Future Internet, 15(12), 393. https://doi.org/10.3390/fi15120393
Sakly, H., Said, M., Al-Sayed, A. A., Loussaief, C., Sakly, R., & Seekins, J. (2023). Blockchain technologies for internet of medical things (BIoMT) based healthcare systems: a new paradigm for COVID-19 pandemic. In Trends of Artificial Intelligence and Big Data for E-Health (pp. 139-165). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-11199-0_8
Sonia, J. J., Jayachandran, P., Md, A. Q., Mohan, S., Sivaraman, A. K., & Tee, K. F. (2023). Machine-learning-based diabetes mellitus risk prediction using multi-layer neural network no-prop algorithm. Diagnostics, 13(4), 723. https://doi.org/10.3390/diagnostics13040723
Wendimu, D., & Biredagn, K. (2022). Developing a knowledge-based system for diagnosis and treatment recommendation of neonatal diseases. Cogent Engineering, 10(1). https://doi.org/10.1080/23311916.2022.2153567
Yadu, S., Chandra, R., & Sinha, V. K. (2024). Comparing different machine learning techniques in predicting diabetes on early stage. Engineering Proceedings, 62(1), 20. https://doi.org/10.3390/engproc2024062020
Zhang, X., He, D., Zheng, Y., Huo, H., Li, S., Chai, R., & Liu, T. (2020). Deep learning based analysis of breast cancer using advanced ensemble classifier and linear discriminant analysis. IEEE Access, 8, 120208–120217. https://doi.org/10.1109/access.2020.3005228
Vidiyala, R. (2022, August 9). Performance Metrics for Classification Machine Learning Problems. Medium. https://towardsdatascience.com/performance-metrics-for-classification-machine-learning-problems-97e7e774a007
Waikato Environment for Knowledge Analysis v. 2.8.6 (2022). WEKA The Workbench for Machine Learning. The University of Waikato, Hamilton, New Zealand. https://waikato.github.io/weka-site/index.html