JOURNAL ARTICLE

Cluster-Based Minority Over-Sampling for Imbalanced Datasets

Kamthorn PuntumaponThanawin RAKTHAMAMONKitsana Waiyamai

Year: 2016 Journal:   IEICE Transactions on Information and Systems Vol: E99.D (12)Pages: 3101-3109   Publisher: Institute of Electronics, Information and Communication Engineers

Abstract

Synthetic over-sampling is a well-known method to solve class imbalance by modifying class distribution and generating synthetic samples. A large number of synthetic over-sampling techniques have been proposed; however, most of them suffer from the over-generalization problem whereby synthetic minority class samples are generated into the majority class region. Learning from an over-generalized dataset, a classifier could misclassify a majority class member as belonging to a minority class. In this paper a method called TRIM is proposed to overcome the over-generalization problem. The idea is to identify minority class regions that compromise between generalization and overfitting. TRIM identifies all the minority class regions in the form of clusters. Then, it merges a large number of small minority class clusters into more generalized clusters. To enhance the generalization ability, a cluster connection step is proposed to avoid over-generalization toward the majority class while increasing generalization of the minority class. As a result, the classifier is able to correctly classify more minority class samples while maintaining its precision. Compared with SMOTE and extended versions such as Borderline-SMOTE, experimental results show that TRIM exhibits significant performance improvement in terms of F-measure and AUC. TRIM can be used as a pre-processing step for synthetic over-sampling methods such as SMOTE and its extended versions.

Keywords:
Overfitting Computer science Generalization Classifier (UML) Artificial intelligence Class (philosophy) Machine learning Trim Oversampling Pattern recognition (psychology) Data mining Algorithm Mathematics Artificial neural network Bandwidth (computing)

Metrics

23
Cited By
0.00
FWCI (Field Weighted Citation Impact)
18
Refs
0.07
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Anomaly Detection Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
Electricity Theft Detection Techniques
Physical Sciences →  Engineering →  Electrical and Electronic Engineering

Related Documents

JOURNAL ARTICLE

Navo Minority Over-sampling Technique (NMOTe): A Consistent Performance Booster on Imbalanced Datasets

Navoneel ChakrabartySanket Biswas

Journal:   Journal of Electronics and Informatics Year: 2020 Vol: 2 (2)Pages: 96-136
BOOK-CHAPTER

Margin-Based Over-Sampling Method for Learning from Imbalanced Datasets

Xiannian FanKe TangThomas Weise

Lecture notes in computer science Year: 2011 Pages: 309-320
BOOK-CHAPTER

Cluster-Based Under-Sampling Using Farthest Neighbour Technique for Imbalanced Datasets

G. RekhaAmit Kumar Tyagi

Advances in intelligent systems and computing Year: 2020 Pages: 35-44
JOURNAL ARTICLE

An Analysis Of Classification Of Imbalanced Datasets By Using Synthetic Minority Over-Sampling Technique

Ghada Alfattni

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2016 Vol: 10 (6)Pages: 1071-1074
JOURNAL ARTICLE

COMPARATIVE ANALYSIS OF CLUSTER CONCENTRIC CIRCLE BASED UNDER SAMPLING OVER LOW VERSUS HIGH DIMENSIONAL IMBALANCED DATASETS

S. Srividhya

Journal:   International Journal of Advanced Research in Computer Science Year: 2017 Vol: 8 (8)Pages: 433-437
© 2026 ScienceGate Book Chapters — All rights reserved.