JOURNAL ARTICLE

Chi-Square and PCA Based Feature Selection for Diabetes Detection with Ensemble Classifier

Abstract

Diabetes mellitus is a metabolic disease that is ranked among the top 10 causes of death by the world health organization.During the last few years, an alarming increase is observed worldwide with a 70% rise in the disease since 2000 and an 80% rise in male deaths.If untreated, it results in complications of many vital organs of the human body which may lead to fatality.Early detection of diabetes is a task of significant importance to start timely treatment.This study introduces a methodology for the classification of diabetic and normal people using an ensemble machine learning model and feature fusion of Chi-square and principal component analysis.An ensemble model, logistic tree classifier (LTC), is proposed which incorporates logistic regression and extra tree classifier through a soft voting mechanism.Experiments are also performed using several well-known machine learning algorithms to analyze their performance including logistic regression, extra tree classifier, AdaBoost, Gaussian naive Bayes, decision tree, random forest, and k nearest neighbor.In addition, several experiments are carried out using principal component analysis (PCA) and Chi-square (Chi-2) features to analyze the influence of feature selection on the performance of machine learning classifiers.Results indicate that Chi-2 features show high performance than both PCA features and original features.However, the highest accuracy is obtained when the proposed ensemble model LTC is used with the proposed feature fusion framework-work which achieves a 0.85 accuracy score which is the highest of the available approaches for diabetes prediction.In addition, the statistical T-test proves the statistical significance of the proposed approach over other approaches.

Keywords:
Random forest Feature selection Ensemble learning Pattern recognition (psychology) Classifier (UML) Principal component analysis Decision tree Logistic regression Support vector machine

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.45
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Artificial Intelligence in Healthcare
Health Sciences →  Health Professions →  Health Information Management
Retinal Imaging and Analysis
Health Sciences →  Medicine →  Radiology, Nuclear Medicine and Imaging
Brain Tumor Detection and Classification
Life Sciences →  Neuroscience →  Neurology
© 2026 ScienceGate Book Chapters — All rights reserved.