Dong ZhangXiang HuangGen LiShengjie KongDong Liang
In view of the data of fault diagnosis and good product testing in the industrial field, high-noise unbalanced data samples exist widely, and such samples are very difficult to analyze in the field of data analysis. The oversampling technique has proved to be a simple solution to unbalanced data in the past, but it has no significant resistance to noise. In order to solve the binary classification problem of high-noise unbalanced data, an enhanced majority-weighted minority oversampling technique, MWMOTE-FRIS-INFFC, is introduced in this study, which is specially used for processing noise-unbalanced classified data sets. The method uses Euclidean distance to assign sample weights, synthesizes and combines new samples into samples with larger weights but belonging to a few classes, and thus solves the problem of data scarcity in smaller class clusters. Then, the fuzzy rough instance selection (FRIS) method is used to eliminate the subsets of synthetic minority samples with low clustering membership, which effectively reduces the overfitting tendency of minority samples caused by synthetic oversampling. In addition, the integration of classification fusion iterative filters (INFFC) helps mitigate synthetic noise issues, both raw data and synthetic data noise. On this basis, a series of experiments are designed to improve the performance of 6 oversampling algorithms on 8 data sets by using the MWMOTE-FRIS-INFFC algorithm proposed in this paper.
Sukarna BaruaMd. Monirul IslamXin YaoKazuyuki Murase
Jianan WeiHaisong HuangLiguo YaoYao HuQingsong FanDong Huang
Chen TianLijuan ZhouShudong ZhangYixuan Zhao
Fei HanChuanzhen WangQing-Hua LingHenry Han