JOURNAL ARTICLE

Leveraging machine learning for diabetes prediction: Ensemble model

Ogutu, McDonald OtienoKituku, Benson NziokaKarume, Simon M

Year: 2025 Journal:   Zenodo (CERN European Organization for Nuclear Research)   Publisher: European Organization for Nuclear Research

Abstract

Diabetes presents great global health challenge, with delayed diagnosis significantly impeding effective management, particularly in resource-constrained regions. This project aimed to enhance timely and accurate diabetes prediction by developing an advanced ensemble machine learning model. A hybrid dataset, compiled from the PIMA Indian (768 instances) and Hospital Frankfurt Germany (2000 instances) datasets, totaling to 2768 datapoints, was utilized to improve generalizability beyond single-source limitations. The methodology involved comprehensive data preprocessing, including the critical imputation of physiologically impossible zero values and feature standardization. F1-score was selected as the primary performance metric due to its ability to provide a vital balance between precision and recall, which is crucial in a medical context where both false positives and false negatives carry significant consequences. Six single classifier models—Logistic Regression, Decision Tree, K-Nearest Neighbors, Support Vector Machine, Random Forest, and XGBoost—were trained on the data and evaluated after hyperparameter tuning. The F1-scores of these optimized models were: Logistic Regression (0.6328), Decision Tree (0.9843), K-Nearest Neighbors (0.9869), Support Vector Machine (0.9843), Random Forest (0.9947), and XGBoost (0.9974). Based on these results, XGBoost and Random Forest were selected as base learners for a Stacking Classifier ensemble, which utilized a Logistic Regression meta-learner. The developed ensemble model demonstrated exceptional performance, achieving near-perfect ROC-AUC of 0.9999 and an F1-score of 0.9974. This performance not only surpassed results from recent studies but also highlighted the significant potential of machine learning to predict diabetes accurately. The project recommended further development and integration of the ensemble model into a web application.

Keywords:
Random forest Ensemble learning Decision tree Support vector machine Ensemble forecasting Logistic regression Generalizability theory Overfitting Naive Bayes classifier

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.71
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Artificial Intelligence in Healthcare
Health Sciences →  Health Professions →  Health Information Management
Machine Learning in Healthcare
Physical Sciences →  Computer Science →  Artificial Intelligence
Digital Imaging for Blood Diseases
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Leveraging machine learning for diabetes prediction: Ensemble model

McDonald Otieno OgutuBenson KitukuSimon M. Karume

Journal:   Global Journal of Engineering and Technology Advances Year: 2025 Vol: 25 (1)Pages: 142-155
JOURNAL ARTICLE

Leveraging machine learning for diabetes prediction: Ensemble model

Ogutu, McDonald OtienoKituku, Benson NziokaKarume, Simon M

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2025
JOURNAL ARTICLE

Diabetes Prediction Using Machine Learning Ensemble Model

Ong Yee HangWiwied VirgiyantiRosly Rosaida

Journal:   Journal of Advanced Research in Applied Sciences and Engineering Technology Year: 2024 Vol: 37 (1)Pages: 82-98
JOURNAL ARTICLE

Enhancing Diabetes Prediction Using Ensemble Machine Learning Model

Aniket K. ShahadePriyanka V. Deshmukh

Journal:   International Journal of Computing and Digital Systems Year: 2024 Vol: 17 (1)Pages: 1-13
JOURNAL ARTICLE

Ensemble Machine Learning Approach for Diabetes Prediction

K R SriPreethaaN. YuvarajG. Jenifa

Journal:   Innovations in Information and Communication Technology Series Year: 2020 Pages: 482-486
© 2026 ScienceGate Book Chapters — All rights reserved.