An imbalanced dataset influences the supervised learning model. Most of the existing real world datasets are imbalanced and often high dimensional. The existing classification methods tend to perform extremely well on the majority class and give least importance to the minority class. Most of the solutions provided for the imbalanced datasets do not fit in for the high dimensional imbalanced datasets. This paper compares the performance of an existing balancing method (cluster concentric circle based under sampling-C3BUS) over low dimensional imbalanced dataset versus high dimensional imbalanced datasets. This work shows that C3BUS works quiet well for low dimensional imbalanced dataset when compared to high dimensional imbalanced dataset and proves that class imbalance and high dimensionality are one of the two main issues in supervised learning process.
Kamthorn PuntumaponThanawin RAKTHAMAMONKitsana Waiyamai
Ginny Y. WongF.H.F. LeungSai Ho Ling
Xiannian FanKe TangThomas Weise
Ireimis Leguen-de-VaronaJulio MaderaYoan Martínez‐LópezJosé Carlos Hernández-Nieto