Cross Project Defect Prediction (CPDP) predicts faults in a target project (which has deficient faulty data) by defect prediction models learned from another project's fault data. Nonetheless, these studies have a prevalent problem that needs uniform metrics, i.e. to describe themselves; distinct projects must have similar features. This article emphasizes heterogeneous CPDP (HCPDP) modeling that does not require the same set of metrics between two applications, and also builds model of defect prediction based on matched heterogeneous metrics that show comparable distribution in their values for a given pair of datasets. HCPDP modeling consists of three main phase feature ranking and feature selection, metric matching, and finally, the binary classification of unlabeled instances in the target application is performed as clean or buggy instances using an appropriate classification. This paper empirically and theoretically evaluates the effect of an additional modeling phase i.e. extraction of features on performance of the HCPDP model. Selection of features is to weed out obsolete or redundant features from your dataset. The key difference between selection and extraction of features is that selection of features preserves a subset of the original features while extraction of features also produces new ones. This paper compares the performance of the proposed HCPDP model on 13 benchmarked datasets of three source project groups AEEEM, ReLink& SOFTLAB with and without applying feature extraction phase using three machine learning classifiers. We have used Chi Square Test (CST) to pick features, and Principle Components Analysis (PCA) method to extract brand features of the dataset. Results show that for prediction pairs (JDT, ar1) & (Safe, ar3), prediction accuracies were significantly improved when we employed feature extraction phase in model. The comparative analysis among all three classifier demonstrates that GBM performs best among all for both prediction pairs.
Jagan Mohan ReddyK. MuthukumaranHossain ShahriarVictor ClincyNazmus Sakib
Sonali SrivastavaShikha RaniShailly SinghSaurabh SinghRohit Vashisht
Xing ZongGuiyu LiShang ZhengHaitao ZouHualong YuShang Gao
Aries SaifudinAgung TrisetyarsoWawan SupartaChuanze KangB S AbbasYaya Heryadi