JOURNAL ARTICLE

COMPARISON OF SMOTE RANDOM FOREST AND SMOTE K-NEAREST NEIGHBORS CLASSIFICATION ANALYSIS ON IMBALANCED DATA

Jus PrasetyaAbdurakhman Abdurakhman

Year: 2023 Journal:   MEDIA STATISTIKA Vol: 15 (2)Pages: 198-208   Publisher: Diponegoro University

Abstract

In machine learning study, classification analysis aims to minimize misclassification and also maximize the results of prediction accuracy. The main characteristic of this classification problem is that there is one class that significantly exceeds the number of samples of other classes. SMOTE minority class data is studied and extrapolated so that it can produce new synthetic samples. Random forest is a classification method consisting of a combination of mutually independent classification trees. K-Nearest Neighbors which is a classification method that labels the new sample based on the nearest neighbors of the new sample. SMOTE generates synthesis data in the minority class, namely class 1 (cervical cancer) to 585 observation respondents (samples) so that the total observation respondents are 1208 samples. SMOTE random forest resulted an accuracy of 96.28%, sensitivity 99.17%, specificity 93.44%, precision 93.70%, and AUC 96.30%. SMOTE K-Nearest Neighborss resulted an accuracy of 87.60%, sensitivity 77.50%, specificity 97.54%, precision 96.88%, and AUC 82.27%. SMOTE random forest produces a perfect classification model, SMOTE K-Nearest neighbors classification produces a good classification model, while the random forest and K-Nearest neighbors classification on imbalanced data results a failed classification model.

Keywords:
Random forest k-nearest neighbors algorithm Artificial intelligence Pattern recognition (psychology) Class (philosophy) Sample (material) Computer science Sensitivity (control systems) Support vector machine Statistical classification Data mining Mathematics Engineering

Metrics

6
Cited By
1.53
FWCI (Field Weighted Citation Impact)
13
Refs
0.81
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Data Mining and Machine Learning Applications
Physical Sciences →  Computer Science →  Information Systems
Data Mining Algorithms and Applications
Physical Sciences →  Computer Science →  Information Systems

Related Documents

JOURNAL ARTICLE

Classification of Imbalanced Big Data using SMOTE with Rough Random Forest

T. DasAbhinandan KhanGoutam Saha

Journal:   International Journal of Engineering and Advanced Technology Year: 2019 Vol: 9 (2)Pages: 5174-5184
JOURNAL ARTICLE

PDR-SMOTE: an imbalanced data processing method based on data region partition and K nearest neighbors

Hongfang ZhouZongling WuNingning XuHao Xiao

Journal:   International Journal of Machine Learning and Cybernetics Year: 2023 Vol: 14 (12)Pages: 4135-4150
JOURNAL ARTICLE

Optimization of the K-Nearest Neighbors (KNN) Algorithm in Imbalanced Dataset Classification Using the SMOTE Technique

Aulia Risyda FauziAhmad FaqihKaslani

Journal:   Journal of Artificial Intelligence and Engineering Applications (JAIEA) Year: 2025 Vol: 4 (2)Pages: 808-814
© 2026 ScienceGate Book Chapters — All rights reserved.