JOURNAL ARTICLE

Addressing imbalanced data classification with Cluster-Based Reduced Noise SMOTE

Javad HemmatianRassoul HajizadehFakhroddin Nazari

Year: 2025 Journal:   PLoS ONE Vol: 20 (2)Pages: e0317396-e0317396   Publisher: Public Library of Science

Abstract

In recent years, the challenge of imbalanced data has become increasingly prominent in machine learning, affecting the performance of classification algorithms. This study proposes a novel data-level oversampling method called Cluster-Based Reduced Noise SMOTE (CRN-SMOTE) to address this issue. CRN-SMOTE combines SMOTE for oversampling minority classes with a novel cluster-based noise reduction technique. In this cluster-based noise reduction approach, it is crucial that samples from each category form one or two clusters, a feature that conventional noise reduction methods do not achieve. The proposed method is evaluated on four imbalanced datasets (ILPD, QSAR, Blood, and Maternal Health Risk) using five metrics: Cohen’s kappa, Matthew’s correlation coefficient (MCC), F1-score, precision, and recall. Results demonstrate that CRN-SMOTE consistently outperformed the state-of-the-art Reduced Noise SMOTE (RN-SMOTE), SMOTE-Tomek Link, and SMOTE-ENN methods across all datasets, with particularly notable improvements observed in the QSAR and Maternal Health Risk datasets, indicating its effectiveness in enhancing imbalanced classification performance. Overall, the experimental findings indicate that CRN-SMOTE outperformed RN-SMOTE in 100% of the cases, achieving average improvements of 6.6% in Kappa, 4.01% in MCC, 1.87% in F1-score, 1.7% in precision, and 2.05% in recall, with setting SMOTE’s neighbors’ number to 5.

Keywords:
Oversampling Computer science Noise (video) Data mining Artificial intelligence Pattern recognition (psychology) Machine learning Noise reduction

Metrics

8
Cited By
38.56
FWCI (Field Weighted Citation Impact)
54
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Machine Learning and Data Classification
Physical Sciences →  Computer Science →  Artificial Intelligence
Data Mining Algorithms and Applications
Physical Sciences →  Computer Science →  Information Systems

Related Documents

JOURNAL ARTICLE

RN-SMOTE: Reduced Noise SMOTE based on DBSCAN for enhancing imbalanced data classification

Ahmed ArafaNawal El‐FishawyMohammed BadawyMarwa Radad

Journal:   Journal of King Saud University - Computer and Information Sciences Year: 2022 Vol: 34 (8)Pages: 5059-5074
JOURNAL ARTICLE

A Cluster Based Classification for Imbalanced Data Using SMOTE

Rajesh Kumar TripathiLinesh RajaAnkit KumarPankaj DadheechAbhishek KumarM N Nachappa

Journal:   IOP Conference Series Materials Science and Engineering Year: 2021 Vol: 1099 (1)Pages: 012080-012080
JOURNAL ARTICLE

SMOTE-LOF for noise identification in imbalanced data classification

Asniar AsniarNur Ulfa MaulideviKridanto Surendro

Journal:   Journal of King Saud University - Computer and Information Sciences Year: 2021 Vol: 34 (6)Pages: 3413-3423
© 2026 ScienceGate Book Chapters — All rights reserved.