Hien M. NguyenEric W. CooperKatsuari Kamei
Learning from imbalanced data has conventionally been conducted on stationary data sets. Recently, there have been several methods proposed for mining imbalanced data streams, in which training data is read in consecutive data chunks. Each data chunk is considered as a conventional imbalanced data set, making it easy to apply sampling methods to balance data chunks. However, one drawback of chunk-based learning methods is that the update of classification models is delayed until a full data chunk is received. Therefore, this paper proposes a new method for online learning from imbalanced data streams, which uses naive Bayes as the base learner. To deal with the problem of class imbalance, a new training instance from the minority class is always involved in learning, but one from the majority class is only used with a small probability. In effect, this method corresponds to an under-sampling technique on imbalanced data streams. We show the effectiveness of the proposed online learning method on ten UCI data sets of various domains. Problems in the performance of naive Bayes on imbalanced data sets are also discussed.
Dianlong YouJiawei XiaoYang WangHuigui YanDi WuZhen ChenLimin ShenXindong Wu
Alberto FernándezSalvador GarcíaMikel GalarRonaldo C. PratiBartosz KrawczykFrancisco Herrera
Zhong ChenVictor S. ShengAndrea EdwardsKun Zhang