Purpose: to build an effective prediction model based on machine learning (ML) algorithms for the risk of type 2 (non-insulin-dependent) Diabetes Mellitus (T2DM).
Methods: I developed two machine learning prediction models based on extreme gradient boosting (XGBoost) and logistic regression (LR). To evaluate the ML prediction models I used the Pima Indian Diabetes dataset (PIDD). The dataset is from the National
Institute of Diabetes and Digestive and Kidney Diseases and consists of 500 non-diabetic patients and 268 diabetes patients.
Results: Models' performance was evaluated using six performance criteria. XGBoost model outperforms the logistic regression. The XGBoost model achieved: area under receiver operating characteristic curve (AUROC) = 85%, sensitivity = 71%, specificity =81%, accuracy =77%, precision = 67%, and F1-score=69% respectively.
Conclusion: This study showed that the XGBoost ML algorithm can be applied to predict individuals at high risk of T2DM in the early phase, which has a strong potential to control diabetes mellitus.
Juginder Pal SinghDeepesh Kumar SrivastavaManoj Kumar
Branimir LjubicAmeen Abdel HaiMarija StanojevićWilson DiazDaniel PolimacMartin PavlovskiZoran Obradović
Farhana BanoK MunidhanalakshmiR. Madana Mohana