JOURNAL ARTICLE

Feature Extraction to Heterogeneous Cross Project Defect Prediction

Abstract

Cross Project Defect Prediction (CPDP) predicts faults in a target project (which has deficient faulty data) by defect prediction models learned from another project's fault data. Nonetheless, these studies have a prevalent problem that needs uniform metrics, i.e. to describe themselves; distinct projects must have similar features. This article emphasizes heterogeneous CPDP (HCPDP) modeling that does not require the same set of metrics between two applications, and also builds model of defect prediction based on matched heterogeneous metrics that show comparable distribution in their values for a given pair of datasets. HCPDP modeling consists of three main phase feature ranking and feature selection, metric matching, and finally, the binary classification of unlabeled instances in the target application is performed as clean or buggy instances using an appropriate classification. This paper empirically and theoretically evaluates the effect of an additional modeling phase i.e. extraction of features on performance of the HCPDP model. Selection of features is to weed out obsolete or redundant features from your dataset. The key difference between selection and extraction of features is that selection of features preserves a subset of the original features while extraction of features also produces new ones. This paper compares the performance of the proposed HCPDP model on 13 benchmarked datasets of three source project groups AEEEM, ReLink& SOFTLAB with and without applying feature extraction phase using three machine learning classifiers. We have used Chi Square Test (CST) to pick features, and Principle Components Analysis (PCA) method to extract brand features of the dataset. Results show that for prediction pairs (JDT, ar1) & (Safe, ar3), prediction accuracies were significantly improved when we employed feature extraction phase in model. The comparative analysis among all three classifier demonstrates that GBM performs best among all for both prediction pairs.

Keywords:
Computer science Ranking (information retrieval) Data mining Feature selection Artificial intelligence Feature extraction Metric (unit) Binary classification Matching (statistics) Machine learning Set (abstract data type) Feature (linguistics) Pattern recognition (psychology) Selection (genetic algorithm) Cross-validation Support vector machine Mathematics Engineering Statistics

Metrics

7
Cited By
1.70
FWCI (Field Weighted Citation Impact)
23
Refs
0.88
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Software Engineering Research
Physical Sciences →  Computer Science →  Information Systems
Software Reliability and Analysis Research
Physical Sciences →  Computer Science →  Software
Software Engineering Techniques and Practices
Physical Sciences →  Computer Science →  Information Systems

Related Documents

JOURNAL ARTICLE

Comprehensive Feature Extraction for Cross-Project Software Defect Prediction

Jagan Mohan ReddyK. MuthukumaranHossain ShahriarVictor ClincyNazmus Sakib

Journal:   2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC) Year: 2022 Pages: 450-451
BOOK-CHAPTER

Heterogeneous Cross Project Defect Prediction – A Survey

Rohit VashishtSyed A. Rizvi

Communications in computer and information science Year: 2020 Pages: 278-288
JOURNAL ARTICLE

Feature Selection in Cross-Project Software Defect Prediction

Aries SaifudinAgung TrisetyarsoWawan SupartaChuanze KangB S AbbasYaya Heryadi

Journal:   Journal of Physics Conference Series Year: 2020 Vol: 1569 (2)Pages: 022001-022001
© 2026 ScienceGate Book Chapters — All rights reserved.