For imbalanced datasets, the focus of classification is to identify samples of the minority class.The performance of current data mining algorithms is not good enough for processing imbalanced datasets.The synthetic minority over-sampling technique (SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby.However, the SMOTE encounters the overgeneralization problem.The densitybased spatial clustering of applications with noise (DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable.This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique (DSMOTE).First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples.In order to make full use of the information of core samples and borderline samples, different strategies are used to over-sample core samples and borderline samples.Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value.
LIU ZhihanZHANG ZhonglinZHAO Lei
Hien M. NguyenEric W. CooperKatsuari Kamei
Chunkai ZhangYing ZhouYingyang ChenYepeng DengXuan WangLifeng DongHaoyu Wei
Qiushi WangKee Jin LeeJihoon Hong