Software defect prediction (SDP) plays a significant part in identifying the most defect-prone modules before software testing and allocating limited testing resources.One of the most commonly used scenarios in SDP is classification.To guarantee the prediction accuracy, the classification models should first be trained appropriately.The training data could be obtained from historical software repositories, which may affect the performance of classification to a large extent.In order to improve the data quality, we propose a novel software feature selection method, which innovatively utilizes the information flows to perform causality analysis in the features of training datasets.More specifically, we conduct causality analysis between each feature metric and the labeled metric bug; then, based on the obtained feature ranking list, we select the top-k features to control redundancy.Finally, we choose the most suitable feature subset based on the F-measure.To demonstrate the effectiveness and practicability of the feature selection method, we select the Nearest Neighbor approach to construct a homogeneous training dataset, and utilize three commonly used classification models to implement comparison experiments.The final experimental results have verified the availability and validity of the feature selection method.
Chao NiWangshu LiuXiang ChenQing GuDaoxu ChenQiguo Huang
Aries SaifudinAgung TrisetyarsoWawan SupartaChuanze KangB S AbbasYaya Heryadi
Siyu JiangTeng OuyangWenhong WangYunpeng Shang
Yi Zhu Yi ZhuYu Zhao Yi ZhuQiao Yu Yu ZhaoXiaoying Chen Qiao Yu