JOURNAL ARTICLE

Software Defect Prediction Using Semi-Supervised Learning with Change Burst Information

Abstract

Software defect prediction is an important software quality assurance technique. It utilizes historical project data and previously discovered defects to predict potential defects. However, most of existing methods assume that large amounts of labeled historical data are available for prediction, while in the early stage of the life cycle, projects may lack the data needed for building such predictors. In addition, most of existing techniques use static code metrics as predictors, while they omit change information that may introduce risks into software development. In this paper, we take these two issues into consideration, and propose a semi-supervised based defect prediction approach - extRF. extRF extends the classical supervised Random Forest algorithm by self-training paradigm. It also employs change burst information for improving accuracy of software defect prediction. We also conduct an experiment to evaluate extRF against three other supervised machine learners (i.e. Logistic Regression, Naive Bayes, Random Forest) and compare the effectiveness of code metrics, change burst metrics, and a combination of them. Experimental results show that extRF trained with a small size of labeled dataset achieves comparable performance to some supervised learning approaches trained with a larger size of labeled dataset. When only 2% of Eclipse 2.0 data are used for training, extRF can achieve F-measure about 0.562, approximate to that of LR (a supervised learning approach) at labeled sampling rate of 50%. Besides, change burst metrics outperform code metrics in that F-measure rises to a peak value of 0.75 for Eclipse 3.0 and JDT.Core.

Keywords:
Computer science Machine learning Eclipse Artificial intelligence Random forest Naive Bayes classifier Supervised learning Software Data mining Source code Software metric Measure (data warehouse) Software quality Software development Support vector machine Artificial neural network

Metrics

19
Cited By
4.65
FWCI (Field Weighted Citation Impact)
41
Refs
0.95
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Software Engineering Research
Physical Sciences →  Computer Science →  Information Systems
Software Reliability and Analysis Research
Physical Sciences →  Computer Science →  Software
Software System Performance and Reliability
Physical Sciences →  Computer Science →  Computer Networks and Communications

Related Documents

JOURNAL ARTICLE

Semi‐supervised Software Defect Prediction Using Task‐Driven Dictionary Learning

Ming ChengGuoqing WuMengting YuanHongyan Wan

Journal:   Chinese Journal of Electronics Year: 2016 Vol: 25 (6)Pages: 1089-1096
JOURNAL ARTICLE

Sample-based software defect prediction with active and semi-supervised learning

Ming LiHongyu ZhangRongxin WuZhi‐Hua Zhou

Journal:   Automated Software Engineering Year: 2011 Vol: 19 (2)Pages: 201-230
JOURNAL ARTICLE

Label propagation based semi-supervised learning for software defect prediction

Zhiwu ZhangXiao‐Yuan JingTiejian Wang

Journal:   Automated Software Engineering Year: 2016 Vol: 24 (1)Pages: 47-69
JOURNAL ARTICLE

An improved semi-supervised learning method for software defect prediction

MaYingPanWeiweiZhuShunzhiYinHuayiLuoJian

Journal:   Journal of Intelligent & Fuzzy Systems Year: 2014
© 2026 ScienceGate Book Chapters — All rights reserved.