Explainable AI Approach in Diabetes Disease Classification Using Gradient Boosting Algorithm
Main Article Content
Abstract
The classification of diabetes has become a critical focus in the healthcare domain due to the disease's rising prevalence and its severe impact on global health. While machine learning methods, such as Gradient Boosting Algorithm (GBA), have shown exceptional performance in predicting diabetes, the interpretability of these models remains a challenge for practical implementation in clinical settings. This study introduces an Explainable AI (XAI) approach to enhance the transparency and interpretability of the Gradient Boosting Algorithm for diabetes classification. Using clinical indicators such as HbA1c levels, Body Mass Index (BMI), and other risk factors, the model achieves high classification accuracy while providing insights into the feature contributions through visualization techniques. SHAP (SHapley Additive exPlanations) was utilized for detailed global and local explanations, while LIME (Local Interpretable Model-agnostic Explanations) offered localized insights into individual predictions. Both LGBoost and XGBoost were compared on the same clinical dataset, where LGBoost achieved an accuracy of 97.27% and XGBoost slightly outperformed with an accuracy of 97.36%, suggesting its marginal advantage in this dataset. The results demonstrate the potential of integrating XAI in machine learning workflows to balance performance and interpretability, thereby fostering trust among healthcare practitioners and aiding in informed decision-making. This research contributes to advancing the application of explainable models in medical diagnostics
Article Details
Section

This work is licensed under a Creative Commons Attribution 4.0 International License.
References
Afzal, F., Yunfei, S., Nazir, M., & Bhatti, S. M. (2021). A review of artificial intelligence based risk assessment methods for capturing complexity-risk interdependencies: Cost overrun in construction projects. International Journal of Managing Projects in Business, 14(2), 300–328.
Argina, A. M. (2020). Penerapan Metode Klasifikasi K-Nearest Neigbor pada Dataset Penderita Penyakit Diabetes. Indonesian Journal of Data and Science, 1(2), 29–33. https://doi.org/10.33096/ijodas.v1i2.11
Awad, S. F., Critchley, J. A., & Abu-Raddad, L. J. (2022). Impact of diabetes mellitus on tuberculosis epidemiology in Indonesia: A mathematical modeling analysis. Tuberculosis, 134, 102164.
Bentéjac, C., Csörgő, A., & Martínez-Muñoz, G. (2021). A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review, 54, 1937–1967.
Chao, G., Zhu, Y., & Chen, L. (2021). Role and risk factors of glycosylated hemoglobin levels in early disease screening. Journal of Diabetes Research, 2021(1), 6626587.
Diana Dewi, D., Qisthi, N., Lestari, S. S. S., & Putri, Z. H. S. (2023). Perbandingan Metode Neural Network Dan Support Vector Machine Dalam Klasifikasi Diagnosa Penyakit Diabetes. Cerdika: Jurnal Ilmiah Indonesia, 3(09), 828–839. https://doi.org/10.59141/cerdika.v3i09.662
Gündoğdu, S. (2023). Efficient prediction of early-stage diabetes using XGBoost classifier with random forest feature selection technique. Multimedia Tools and Applications, 82(22), 34163–34181. https://doi.org/10.1007/s11042-023-15165-8
Gupta, S. K., & Shukla, D. P. (2023). Handling data imbalance in machine learning based landslide susceptibility mapping: a case study of Mandakini River Basin, North-Western Himalayas. Landslides, 20(5), 933–949.
Konstantinov, A. V, & Utkin, L. V. (2021). Interpretable machine learning with an ensemble of gradient boosting machines. Knowledge-Based Systems, 222, 106993.
Lin, W. (2024). The Association between Body Mass Index and Glycohemoglobin (HbA1c) in the US Population’s Diabetes Status. International Journal of Environmental Research and Public Health, 21(5), 517.
Mienye, I. D., & Sun, Y. (2022). A survey of ensemble learning: Concepts, algorithms, applications, and prospects. IEEE Access, 10, 99129–99149.
Ni, C., Huang, H., Cui, P., Ke, Q., Tan, S., Ooi, K. T., & Liu, Z. (2024). Light Gradient Boosting Machine (LightGBM) to forecasting data and assisting the defrosting strategy design of refrigerators. International Journal of Refrigeration, 160, 182–196.
Oikonomou, E. K., & Khera, R. (2023). Machine learning in precision diabetes care and cardiovascular risk prediction. Cardiovascular Diabetology, 22(1), 259.
Oktaria, V., & Mahendradhata, Y. (2022). The health status of Indonesia’s provinces: the double burden of diseases and inequality gap. The Lancet Global Health, 10(11), e1547–e1548.
Rahayu, D. S., Afifah, J., & Intan, S. (2023). Classification of Diabetes Mellitus Using C4 . 5 Algorithm , Support Vector Machine ( SVM ) and Linear Regression Klasifikasi Penyakit Diabetes Melitus Menggunakan Algoritma C4 . 5 , Support Vector Machine ( SVM ) dan Regresi Linear. SENTIMAS: Seminar Nasional Penelitian Dan Pengabdian Masyarakat, 1(1 SE-), 56–63. https://journal.irpi.or.id/index.php/sentimas/article/view/550
Wang, L., Li, X., Wang, Z., Bancks, M. P., Carnethon, M. R., Greenland, P., Feng, Y.-Q., Wang, H., & Zhong, V. W. (2021). Trends in prevalence of diabetes and control of risk factors in diabetes among US adults, 1999-2018. Jama, 326(8), 704–716.