Hazel A. GamengBobby B. GerardoRuji P. Medina
The oversampling technique in the data preprocessing has been utilized to mitigate the imbalanced data problem in the real research scenario. This imbalance may reduce the ability of classification algorithms to recognize cases of interest leading to misclassification of positive samples as negative class or the false positive generation. Synthetic Minority Oversampling Technique (SMOTE) is one of the oversampling techniques existing and the Adaptive Synthetic (Adasyn) SMOTE is one of its many variants. K-Nearest Neighbor (KNN) is incorporated in Adasyn. In this study, Manhattan distance is applied in the KNN computations. This modified Adasyn was evaluated in terms of its effectiveness in the performance measure of overall accuracy, precision, recall and F1 measure on the six imbalanced datasets using logistic regression as the classification algorithm. The modified Adasyn dominated over SMOTE and the original Adasyn by 66.67 percent of the total performance metric count. It leads the accuracy and recall count with 4 out of 6, precision count with 3 out of 6, and the F1 measure count with 5 over 6. Thus, proving that the modified Adasyn can provide an efficient solution in decreasing misclassification on imbalanced datasets.
Ahmed Saad HusseinTianrui LiDoaa Mohsin Abd AliKamal BashirChubato Wondaferaw Yohannese
Josey MathewMing LuoChee Khiang PangHian Leng Chan
Pratyusha ShuklaKiran Bhowmick