JOURNAL ARTICLE

Early Prediction of Diabetes Using an Ensemble of Machine Learning Models

Aishwariya DuttaMd. Kamrul HasanMohiuddin AhmadMd. Abdul AwalMd. Akhtarul IslamMehedi MasudHossam Meshref

Year: 2022 Journal:   International Journal of Environmental Research and Public Health Vol: 19 (19)Pages: 12378-12378   Publisher: Multidisciplinary Digital Publishing Institute

Abstract

Diabetes is one of the most rapidly spreading diseases in the world, resulting in an array of significant complications, including cardiovascular disease, kidney failure, diabetic retinopathy, and neuropathy, among others, which contribute to an increase in morbidity and mortality rate. If diabetes is diagnosed at an early stage, its severity and underlying risk factors can be significantly reduced. However, there is a shortage of labeled data and the occurrence of outliers or data missingness in clinical datasets that are reliable and effective for diabetes prediction, making it a challenging endeavor. Therefore, we introduce a newly labeled diabetes dataset from a South Asian nation (Bangladesh). In addition, we suggest an automated classification pipeline that includes a weighted ensemble of machine learning (ML) classifiers: Naive Bayes (NB), Random Forest (RF), Decision Tree (DT), XGBoost (XGB), and LightGBM (LGB). Grid search hyperparameter optimization is employed to tune the critical hyperparameters of these ML models. Furthermore, missing value imputation, feature selection, and K-fold cross-validation are included in the framework design. A statistical analysis of variance (ANOVA) test reveals that the performance of diabetes prediction significantly improves when the proposed weighted ensemble (DT + RF + XGB + LGB) is executed with the introduced preprocessing, with the highest accuracy of 0.735 and an area under the ROC curve (AUC) of 0.832. In conjunction with the suggested ensemble model, our statistical imputation and RF-based feature selection techniques produced the best results for early diabetes prediction. Moreover, the presented new dataset will contribute to developing and implementing robust ML models for diabetes prediction utilizing population-level data.

Keywords:
Random forest Naive Bayes classifier Artificial intelligence Ensemble learning Computer science Hyperparameter Missing data Feature selection Machine learning Decision tree Diabetes mellitus Receiver operating characteristic Data mining Statistics Medicine Mathematics Support vector machine

Metrics

132
Cited By
42.15
FWCI (Field Weighted Citation Impact)
78
Refs
1.00
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Artificial Intelligence in Healthcare
Health Sciences →  Health Professions →  Health Information Management
Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Machine Learning in Healthcare
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Diabetes Early Prediction Using Machine Learning and Ensemble Methods

Hyung-Ho HaH. Jin KimYoung Hyun YuHyun Sub Sim

Journal:   International Journal on Advanced Science Engineering and Information Technology Year: 2025 Vol: 15 (2)Pages: 363-375
JOURNAL ARTICLE

Diabetes Prediction Using Machine Learning Ensemble Model

Ong Yee HangWiwied VirgiyantiRosly Rosaida

Journal:   Journal of Advanced Research in Applied Sciences and Engineering Technology Year: 2024 Vol: 37 (1)Pages: 82-98
JOURNAL ARTICLE

Diabetes Prediction Using Machine Learning Analytics: Ensemble Learning Techniques

Deeksha TripathiSaroj Kr. BiswasS. ReshmiArpita Nath BoruahBiswajit Purkayastha

Journal:   2022 2nd Asian Conference on Innovation in Technology (ASIANCON) Year: 2022 Pages: 1-7
© 2026 ScienceGate Book Chapters — All rights reserved.