Software Defect Prediction Using Semi-Supervised Learning with Change Burst Information

Qing He; Beijun Shen; Yuting Chen

doi:10.1109/compsac.2016.193

ScienceGate Book Chapters

JOURNAL ARTICLE

Software Defect Prediction Using Semi-Supervised Learning with Change Burst Information

Qing He Beijun Shen Yuting Chen

Year: 2016 Pages: 113-122

DOI: 10.1109/compsac.2016.193

Get Full-Text PDF Get Analytical Report

Abstract

Software defect prediction is an important software quality assurance technique. It utilizes historical project data and previously discovered defects to predict potential defects. However, most of existing methods assume that large amounts of labeled historical data are available for prediction, while in the early stage of the life cycle, projects may lack the data needed for building such predictors. In addition, most of existing techniques use static code metrics as predictors, while they omit change information that may introduce risks into software development. In this paper, we take these two issues into consideration, and propose a semi-supervised based defect prediction approach - extRF. extRF extends the classical supervised Random Forest algorithm by self-training paradigm. It also employs change burst information for improving accuracy of software defect prediction. We also conduct an experiment to evaluate extRF against three other supervised machine learners (i.e. Logistic Regression, Naive Bayes, Random Forest) and compare the effectiveness of code metrics, change burst metrics, and a combination of them. Experimental results show that extRF trained with a small size of labeled dataset achieves comparable performance to some supervised learning approaches trained with a larger size of labeled dataset. When only 2% of Eclipse 2.0 data are used for training, extRF can achieve F-measure about 0.562, approximate to that of LR (a supervised learning approach) at labeled sampling rate of 50%. Besides, change burst metrics outperform code metrics in that F-measure rises to a peak value of 0.75 for Eclipse 3.0 and JDT.Core.

Keywords:

Computer science Machine learning Eclipse Artificial intelligence Random forest Naive Bayes classifier Supervised learning Software Data mining Source code Software metric Measure (data warehouse) Software quality Software development Support vector machine Artificial neural network

Metrics

Cited By

4.65

FWCI (Field Weighted Citation Impact)

Refs

0.95

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Software Engineering Research

Physical Sciences → Computer Science → Information Systems

Software Reliability and Analysis Research

Physical Sciences → Computer Science → Software

Software System Performance and Reliability

Physical Sciences → Computer Science → Computer Networks and Communications

Software Defect Prediction Using Semi-Supervised Learning with Change Burst Information

Abstract

Metrics

Citation History

Topics

Related Documents

Software defect prediction using semi-supervised learning with dimension reduction

Semi‐supervised Software Defect Prediction Using Task‐Driven Dictionary Learning

Sample-based software defect prediction with active and semi-supervised learning

Label propagation based semi-supervised learning for software defect prediction

An improved semi-supervised learning method for software defect prediction