JOURNAL ARTICLE

Misclassification Cost-Sensitive Software Defect Prediction

Abstract

Software defect prediction helps developers focus on defective modules for efficient software quality assurance. A common goal shared by existing software defect prediction methods is to attain low classification error rates. These proposals suffer from two practical problems: (i) Most of the prediction methods rely on a large number of labeled training data. However, collecting labeled data is a difficult and expensive task. It is hard to obtain classification labels over new software projects or existing projects without historical defect data. (ii) Software defect datasets are highly imbalanced. In many real-world applications, the misclassification cost of defective modules is generally several times higher than that of non-defective ones. In this paper, we present a misclassification Cost-sensitive approach to Software Defect Prediction (CSDP). The CSDP approach is novel in two aspects: First, CSDP addresses the problem of unlabeled software detect datasets by combining an unsupervised sampling method with a domain specific misclassification cost model. This preprocessing step selectively samples a small percentage of modules through estimating their classification labels. Second, CSDP builds a cost-sensitive support vector machine model to predict defect-proneness of the rest of modules with both overall classification error rate and domain specific misclassification cost as quality metrics. CSDP is evaluated on four NASA projects. Experimental results highlight three interesting observations: (1) CSDP achieves higher Normalized Expected Cost of Misclassification (NECM) compared with state-of-art supervised learning models under imbalanced training data with limited labeling. (2) CSDP outperforms state-of-art semi-supervised learning methods, which disregards classification costs, especially in recall rate. (3) CSDP enhanced through unsupervised sampling as a preprocessing step prior to training and prediction outperforms the baseline CSDP without the sampling process.

Keywords:
Computer science Preprocessor Software Machine learning Data mining Artificial intelligence Software bug Software quality assurance Domain (mathematical analysis) Support vector machine Software metric Software quality Data pre-processing Quality (philosophy) Quality assurance Software development Engineering

Metrics

4
Cited By
1.34
FWCI (Field Weighted Citation Impact)
32
Refs
0.85
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Software Engineering Research
Physical Sciences →  Computer Science →  Information Systems
Software Reliability and Analysis Research
Physical Sciences →  Computer Science →  Software
Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Cost Sensitive Boosting Software Defect Prediction Method

LI Li, REN Zhenkang, SHI Kexin

Journal:   DOAJ (DOAJ: Directory of Open Access Journals) Year: 2022
JOURNAL ARTICLE

Cost-sensitive Dictionary Learning for Software Defect Prediction

Liang NiuJianwu WanHongyuan WangKaiwei Zhou

Journal:   Neural Processing Letters Year: 2020 Vol: 52 (3)Pages: 2415-2449
JOURNAL ARTICLE

Software defect prediction using cost-sensitive neural network

Ömer Faruk ArarKürşat Ayan

Journal:   Applied Soft Computing Year: 2015 Vol: 33 Pages: 263-277
© 2026 ScienceGate Book Chapters — All rights reserved.