JOURNAL ARTICLE

Imbalanced data classification using improved synthetic minority over-sampling technique

Yamijala AnushaR. VisalakshiKonda Srinivas

Year: 2023 Journal:   Multiagent and Grid Systems Vol: 19 (2)Pages: 117-131   Publisher: IOS Press

Abstract

In data mining, deep learning and machine learning models face class imbalance problems, which result in a lower detection rate for minority class samples. An improved Synthetic Minority Over-sampling Technique (SMOTE) is introduced for effective imbalanced data classification. After collecting the raw data from PIMA, Yeast, E.coli, and Breast cancer Wisconsin databases, the pre-processing is performed using min-max normalization, cleaning, integration, and data transformation techniques to achieve data with better uniqueness, consistency, completeness and validity. An improved SMOTE algorithm is applied to the pre-processed data for proper data distribution, and then the properly distributed data is fed to the machine learning classifiers: Support Vector Machine (SVM), Random Forest, and Decision Tree for data classification. Experimental examination confirmed that the improved SMOTE algorithm with random forest attained significant classification results with Area under Curve (AUC) of 94.30%, 91%, 96.40%, and 99.40% on the PIMA, Yeast, E.coli, and Breast cancer Wisconsin databases.

Keywords:
Computer science Support vector machine Artificial intelligence Machine learning Random forest Normalization (sociology) Decision tree Raw data Data mining

Metrics

8
Cited By
2.04
FWCI (Field Weighted Citation Impact)
42
Refs
0.86
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Rough Sets and Fuzzy Logic
Physical Sciences →  Computer Science →  Computational Theory and Mathematics
© 2026 ScienceGate Book Chapters — All rights reserved.