LIU ZhihanZHANG ZhonglinZHAO Lei
An oversampling algorithm based on density peak clustering is proposed to solve the problem of noise and imbalance among classes in imbalanced data sets. Firstly, most of the samples are preprocessed, and the noise samples are screened and deleted. Secondly , the algorithm adopts density peak clustering for all minority samples and removes noise points. Then the sampling weights are assigned according to the different sparsity of each cluster, and the number of new samples to be synthesized for each cluster is calculated. SMOTE oversampling is performed in each cluster to synthesize new samples. The proposed oversampling algorithm is compared with five common oversampling algorithms. It is combined with five base classifiers respectively, and comparison experiments are carried out on six imbalanced data sets. The experimental results show that F1 , G-mean and AUC of this method can increase by 1. 21% , 0. 94% and 5. 14% at least. The maximum increase can be 15. 90% , 14. 99% , 11. 26% . It is proved that this method can reduce sample overlap, effectively avoid noise generation in imbalanced data sets, and improve classification accuracy.
Lawrence Chuin Ming LiawShing Chiang TanPey Yun GohChee Peng Lim
Firza Refo Adi PratamaSiskarossa Ika Oktora
Lawrence Chuin Ming LiawShing Chiang TanPey Yun GohChee Peng Lim
Jiaqi GuoHaiyan WuXiaolei ChenWeiguo Lin