Optimizing Diabetes Classification with Support Vector Machine and SMOTEENN-based Feature Selection

Boon Feng Wee; Harn Hsem Hwong; Saaveethya Sivakumar; King Hann Lim; W. K. Wong; Ing Ming Chew; Meng Chung Tiong

doi:10.1109/icdate58146.2023.10248600

ScienceGate Book Chapters

JOURNAL ARTICLE

Optimizing Diabetes Classification with Support Vector Machine and SMOTEENN-based Feature Selection

Boon Feng Wee Harn Hsem Hwong Saaveethya Sivakumar King Hann Lim W. K. Wong Ing Ming Chew Meng Chung Tiong

Year: 2023 Vol: 157 Pages: 1-5

DOI: 10.1109/icdate58146.2023.10248600

Get Full-Text PDF Get Analytical Report

Abstract

The use of data-driven model in diabetes detection has gained much attention nowadays to improve the globe medical systems due to its cost-effective and less-invasive methods. The common studies implement statistical feature selection such as PCC or PCA with an assumption of linear relationships, which leads to impracticality in real-life diabetic data. In this paper, a proposed SMOTEENN-based univariate feature selection method is proposed in machine learning-based diabetes classification models. It combines the advantages of SMOTEENN oversampling and univariate feature selection to improve the classification rate with lower dimensional input. A more extensive dataset should be taken into consideration and compared to verify further this method's effectiveness in solving this task. The results acquired from this research implies that this proposed method is effective in achieving high classification accuracy, where the Logistic Regression, Random Forest and Support Vector Machine-based models constructed in this research are able to achieve accuracy of over 90% after feature selection; while reducing the computational cost and time required for the classification tasks at the same time.

Keywords:

Feature selection Univariate Random forest Computer science Support vector machine Artificial intelligence Oversampling Machine learning Feature (linguistics) Selection (genetic algorithm) Data mining Logistic regression Statistical classification Multivariate statistics

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.20

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Artificial Intelligence in Healthcare

Health Sciences → Health Professions → Health Information Management

Imbalanced Data Classification Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Machine Learning in Healthcare

Physical Sciences → Computer Science → Artificial Intelligence

Optimizing Diabetes Classification with Support Vector Machine and SMOTEENN-based Feature Selection

Abstract

Metrics

Topics

Related Documents

Feature Selection for Cancer Classification Based on Support Vector Machine

Feature Selection for Cancer Classification Based on Support Vector Machine

Deep Learning and SMOTEENN-based Univariate Feature Selection Approaches for Diabetes Classification

Feature Selection based Classification of Spams Using Fuzzy Support Vector Machine

Optimization Approach for Feature Selection and Classification with Support Vector Machine