JOURNAL ARTICLE

DPC-SMOTE Over-sampling Algorithm for Imbalanced Data Classification

LIU ZhihanZHANG ZhonglinZHAO Lei

Year: 2024 Journal:   DOAJ (DOAJ: Directory of Open Access Journals)

Abstract

An oversampling algorithm based on density peak clustering is proposed to solve the problem of noise and imbalance among classes in imbalanced data sets. Firstly, most of the samples are preprocessed, and the noise samples are screened and deleted. Secondly , the algorithm adopts density peak clustering for all minority samples and removes noise points. Then the sampling weights are assigned according to the different sparsity of each cluster, and the number of new samples to be synthesized for each cluster is calculated. SMOTE oversampling is performed in each cluster to synthesize new samples. The proposed oversampling algorithm is compared with five common oversampling algorithms. It is combined with five base classifiers respectively, and comparison experiments are carried out on six imbalanced data sets. The experimental results show that F1 , G-mean and AUC of this method can increase by 1. 21% , 0. 94% and 5. 14% at least. The maximum increase can be 15. 90% , 14. 99% , 11. 26% . It is proved that this method can reduce sample overlap, effectively avoid noise generation in imbalanced data sets, and improve classification accuracy.

Keywords:
Oversampling Cluster analysis Noise (video) Pattern recognition (psychology) Sampling (signal processing) Cluster (spacecraft) Sample (material)

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.31
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Machine Learning and Data Classification
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Computing and Algorithms
Social Sciences →  Social Sciences →  Urban Studies

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.