Cross-project defect prediction (CPDP) has recently attracted increasing attention in the field of Software Engineering. Most of the previous studies, which treated it as a binary classification problem or a regression problem, are not practical for software testing activities. To provide developers with a more valuable ranking of the most severe entities (e.g., classes and modules), in this paper, we propose a top-k learning to rank (LTR) approach in the scenario of CPDP. In particular, we first convert the number of defects into graded relevance to a specific query according to the three-sigma rule; then, we put forward a new data resampling method called SMOTE-PENN to tackle the imbalanced data problem. An empirical study on the PROMISE dataset shows that SMOTE-PENN outperforms the other six competitive resampling algorithms and RankNet performs the best for the proposed approach framework. Thus, our work could lay a foundation for efficient search engines for top-ranked defective entities in real software testing activities without local historical data for a target project.
Ali Bou NassifManar Abu TalibMohammad AzzehShaikha AlzaabiRawan KhanfarRuba KharsaLefteris Angelis
Wenbo MiYong LiMing WenYouren Chen
Yahaya Zakariyau BalaPathiah Abdul SamatKhaironi Yatim SharifNoridayu Manshor
Jing SunXiao‐Yuan JingXiwei Dong