Although many studies have investigated the usage of machine learning (ML) algorithms for cross-project software defect prediction, limited works addressed cross-project defect number prediction (CPDNP). This study investigated the robustness of feature selection in five ML algorithms, such as Decision Trees, Random Forests, Gradient Boosted Trees, Generalized Linear Model, and Deep Neural Networks, for CPDNP. The results showed that Gradient Boosted Tree generated models with the lowest errors in most projects. However, models generated by Random Forests and Deep Neural Networks with feature selection were considered the most robust ones. Meanwhile, models generated by Decision Trees were the least robust. The feature selection was sufficient to generate robust ML models for CPDNP.
Aries SaifudinAgung TrisetyarsoWawan SupartaChuanze KangB S AbbasYaya Heryadi
Tianwei LeiJingfeng XueWeijie Han
Qing HeBiwen LiBeijun ShenYong Xia
Ahmed AbduZhengjun ZhaiHakim A. AbdoRedhwan AlgabriSungon Lee
Songsong LingBin TangTao YeQiang HuJunwei DuXu Yu