A large number of datasets in various applications are imbalanced in which majority samples dominate minority samples. The skewed distribution poses a difficulty for existing learning approaches. Oversampling techniques address this concern by replicating original samples or adding new synthetic samples of minority class. Even with success, they raise the problems of over-generation and overlapping. In this paper, we propose an entropy difference-based oversampling approach (EDOS) for imbalanced learning using a novel metric, termed entropy difference (ED). First, given a dataset, EDOS measures the imbalance degree between the majority and the minority with ED. Second, EDOS creates synthetic minority samples. For each synthetic sample, EDOS evaluates its retention capability and remains the informative sample. Third, original and qualified synthetic samples are combined to train the classifiers. In the experiments, we demonstrate the effectiveness of the proposed EDOS method on several UCI datasets.
Michał KoziarskiBartosz KrawczykMichał Woźniak
Kyung-Min KimHa-Young JangByoung‐Tak Zhang