Traditionally, software fault prediction models are built by assuming a uniform misclassification cost. In other words, cost implications of misclassifying a faulty module as fault free are assumed to be the same as the cost implications of misclassifying a fault free module as faulty. In reality, these two types of misclassification costs are rarely equal. They are project-specific, reflecting the characteristics of the domain in which the program operates. In this paper, using project information from a public repository, we analyze the benefits of techniques which incorporate misclassification costs in the development of software fault prediction models. We find that cost-sensitive learning does not provide operational points which outperform cost-insensitive classifiers. However, an advantage of cost-sensitive modeling is the explicit choice of the operational threshold appropriate for the cost differential.
Ling XuWang BeiLing LiuMo ZhouShengping LiaoMeng Yan
Zhiqiang LiXiao‐Yuan JingXiaoke Zhu
Yue JiangBojan ČukićTim Menzies
Koen W. De BockKristof CoussementStefan Lessmann