Type II diabetes is a chronic metabolic disease secondary to elevated blood glucose levels. Complications of this disease include heart attack, stroke, blindness, renal failure, lower limb amputation and mortality. Due to its rising prevalence and consequent mortality, it is important to identify at an early stage those patients at high risk of developing diabetes. We applied 8 machine learning techniques namely: support vector machine, logistic regression, k-nearest neighbor, naïve Bayes, decision tree, random forest, AdaBoost and XGBoost in predicting diabetes using a publicly available diabetes dataset. In our study, Naïve Bayes with median imputation and recursive feature elimination obtained the highest performance with an accuracy rate of 81.0%. Although the results are very promising, one major limitation in this study is the small number of samples in the dataset. Early accurate detection can help patients to proactively monitor their lifestyle habits mitigating the risks of complications of uncontrolled diabetes.
Vincent Peter C. MagbooMa. Sheila A. Magboo
Damar WicaksonoAffix MaretaArdy ErdiyantoNuzula AfianahRizki Ramadhani
Akshay Ramesh bhai GuptaJitendra Agrawal