The data imbalance problem hampers the classification task. In streaming environments, this becomes even more cumbersome as the proportion of classes can vary over time. Approaches based on misclassification costs can be used to mitigate this problem. In this paper, we present the Cost-sensitive Adaptive Random Forest (CSARF) and compare it to the Adaptive Random Forest (ARF) and ARF with Resampling (ARFRE) in six real-world and six synthetic data sets with different class ratios. The empirical study analyzes two misclassification costs strategies of the CSARF and shows that the CSARF obtained statistically superior w.r.t. the average recall and average F1 when compared to ARF.
Zhong ChenVictor S. ShengAndrea EdwardsKun Zhang
Ying‐Ying ChenXiaowei YangHong‐Liang Dai
Nguyen Thai-NgheZeno GantnerLars Schmidt-Thieme
Bartosz KrawczykPrzemysław Skryjomski