JOURNAL ARTICLE

Online learning from imbalanced data streams

Abstract

Learning from imbalanced data has conventionally been conducted on stationary data sets. Recently, there have been several methods proposed for mining imbalanced data streams, in which training data is read in consecutive data chunks. Each data chunk is considered as a conventional imbalanced data set, making it easy to apply sampling methods to balance data chunks. However, one drawback of chunk-based learning methods is that the update of classification models is delayed until a full data chunk is received. Therefore, this paper proposes a new method for online learning from imbalanced data streams, which uses naive Bayes as the base learner. To deal with the problem of class imbalance, a new training instance from the minority class is always involved in learning, but one from the majority class is only used with a small probability. In effect, this method corresponds to an under-sampling technique on imbalanced data streams. We show the effectiveness of the proposed online learning method on ten UCI data sets of various domains. Problems in the performance of naive Bayes on imbalanced data sets are also discussed.

Keywords:
Computer science Data stream mining Machine learning Concept drift Naive Bayes classifier Artificial intelligence Data mining Class (philosophy) Data sampling Data set Data stream Labeled data Set (abstract data type) Sampling (signal processing) Training set Support vector machine

Metrics

47
Cited By
5.48
FWCI (Field Weighted Citation Impact)
20
Refs
0.96
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Data Stream Mining Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Anomaly Detection Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.