Imbalanced data sets exist widely in real life. The identification of minority class tends to be the focus of classification as for imbalanced data sets. However, the results of classification of imbalanced data set by traditional support vector machines are poor. In order to improve the recognition accuracy of the minority class, an over-sampling method based on combination of probability density function estimation and Gibbs sampling is proposed. Firstly, probability density function of the minority class is estimated on the basis of Parzen window; then, Gibbs sampling technique is used to generate new samples which are in accordance with the minority sample distribution according to the acquired probability density function. Thus, a relative balanced training data set is generated. Finally, the support vector machine is learned on the new data set. Experimental results on a synthetic dataset and five benchmark UCI datasets are provided to show the effectiveness of the proposed method.
Ming GaoXia HongSheng ChenC.J. Harris
Xiannian FanKe TangThomas Weise
Kamthorn PuntumaponThanawin RAKTHAMAMONKitsana Waiyamai
Zhaoke HuangChunhua YangXiaofang ChenKeke HuangYongfang Xie
Ginny Y. WongF.H.F. LeungSai Ho Ling