Feature Extraction to Heterogeneous Cross Project Defect Prediction

Rohit Vashisht; Syed A. Rizvi

doi:10.1109/icrito48877.2020.9197799

ScienceGate Book Chapters

JOURNAL ARTICLE

Feature Extraction to Heterogeneous Cross Project Defect Prediction

Rohit Vashisht Syed A. Rizvi

Year: 2020 Vol: 27 Pages: 1221-1225

DOI: 10.1109/icrito48877.2020.9197799

Get Full-Text PDF Get Analytical Report

Abstract

Cross Project Defect Prediction (CPDP) predicts faults in a target project (which has deficient faulty data) by defect prediction models learned from another project's fault data. Nonetheless, these studies have a prevalent problem that needs uniform metrics, i.e. to describe themselves; distinct projects must have similar features. This article emphasizes heterogeneous CPDP (HCPDP) modeling that does not require the same set of metrics between two applications, and also builds model of defect prediction based on matched heterogeneous metrics that show comparable distribution in their values for a given pair of datasets. HCPDP modeling consists of three main phase feature ranking and feature selection, metric matching, and finally, the binary classification of unlabeled instances in the target application is performed as clean or buggy instances using an appropriate classification. This paper empirically and theoretically evaluates the effect of an additional modeling phase i.e. extraction of features on performance of the HCPDP model. Selection of features is to weed out obsolete or redundant features from your dataset. The key difference between selection and extraction of features is that selection of features preserves a subset of the original features while extraction of features also produces new ones. This paper compares the performance of the proposed HCPDP model on 13 benchmarked datasets of three source project groups AEEEM, ReLink& SOFTLAB with and without applying feature extraction phase using three machine learning classifiers. We have used Chi Square Test (CST) to pick features, and Principle Components Analysis (PCA) method to extract brand features of the dataset. Results show that for prediction pairs (JDT, ar1) & (Safe, ar3), prediction accuracies were significantly improved when we employed feature extraction phase in model. The comparative analysis among all three classifier demonstrates that GBM performs best among all for both prediction pairs.

Keywords:

Computer science Ranking (information retrieval) Data mining Feature selection Artificial intelligence Feature extraction Metric (unit) Binary classification Matching (statistics) Machine learning Set (abstract data type) Feature (linguistics) Pattern recognition (psychology) Selection (genetic algorithm) Cross-validation Support vector machine Mathematics Engineering Statistics

Metrics

Cited By

1.70

FWCI (Field Weighted Citation Impact)

Refs

0.88

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Software Engineering Research

Physical Sciences → Computer Science → Information Systems

Software Reliability and Analysis Research

Physical Sciences → Computer Science → Software

Software Engineering Techniques and Practices

Physical Sciences → Computer Science → Information Systems

Feature Extraction to Heterogeneous Cross Project Defect Prediction

Abstract

Metrics

Citation History

Topics

Related Documents

Comprehensive Feature Extraction for Cross-Project Software Defect Prediction

Heterogeneous Cross Project Defect Prediction in Software

Heterogeneous Cross Project Defect Prediction – A Survey

Heterogeneous Cross-Project Defect Prediction via Optimal Transport

Feature Selection in Cross-Project Software Defect Prediction