JOURNAL ARTICLE

Software Defect Prediction Method Based on Clustering Ensemble Learning

Abstract

The technique of software defect prediction aims to assess and predict potential defects in software projects and has made significant progress in recent years within software development. In previous studies, this technique largely relied on supervised learning methods, requiring a substantial amount of labeled historical defect data to train the models. However, obtaining these labeled data often demands significant time and resources. In contrast, software defect prediction based on unsupervised learning does not depend on known labeled data, eliminating the need for large‐scale data labeling, thereby saving considerable time and resources while providing a more flexible solution for ensuring software quality. This paper conducts software defect prediction using unsupervised learning methods on data from 16 projects across two public datasets (PROMISE and NASA). During the feature selection step, a chi‐squared sparse feature selection method is proposed. This feature selection strategy combines chi‐squared tests with sparse principal component analysis (SPCA). Specifically, the chi‐squared test is first used to filter out the most statistically significant features, and then the SPCA is applied to reduce the dimensionality of these significant features. In the clustering step, the dot product matrix and Pearson correlation coefficient (PCC) matrix are used to construct weighted adjacency matrices, and a clustering overlap method is proposed. This method integrates spectral clustering, Newman clustering, fluid clustering, and Clauset–Newman–Moore (CNM) clustering through ensemble learning. Experimental results indicate that, in the absence of labeled data, using the chi‐squared sparse method for feature selection demonstrates superior performance, and the proposed clustering overlap method outperforms or is comparable to the effectiveness of the four baseline clustering methods.

Keywords:
Computer science Cluster analysis Software Ensemble learning Software bug Artificial intelligence Machine learning Software engineering Data mining Programming language

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
54
Refs
0.34
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Software Engineering Research
Physical Sciences →  Computer Science →  Information Systems
Software Reliability and Analysis Research
Physical Sciences →  Computer Science →  Software
Software Testing and Debugging Techniques
Physical Sciences →  Computer Science →  Software

Related Documents

JOURNAL ARTICLE

Ensemble learning based software defect prediction

Xin DongYan LiangShoichiro MiyamotoShingo Yamaguchi

Journal:   Journal of Engineering Research Year: 2023 Vol: 11 (4)Pages: 377-391
JOURNAL ARTICLE

Software defect prediction based model utilizing ensemble learning

Raghvendra Omprakash SinghSunil Kumar Gupta

Journal:   AIP conference proceedings Year: 2024 Vol: 3222 Pages: 030007-030007
BOOK-CHAPTER

Stacking Based Ensemble Learning for Improved Software Defect Prediction

Sweta MehtaK. Sridhar Patnaik

Lecture notes in electrical engineering Year: 2021 Pages: 167-178
© 2026 ScienceGate Book Chapters — All rights reserved.