JOURNAL ARTICLE

Refactoring Prediction Improvement using Various Feature Selection and Data Sampling Techniques

Abstract

Refactoring is a key aspect of improving internal software quality without altering the functionality of software during its development life cycle. To statistically assess software quality, software metrics have become a crucial component of the software development process. We may speed up computation and increase the effectiveness of refactoring the prediction model by implementing various feature selection techniques, data sampling techniques and machine learning techniques.Objective: This paper aims to improve the predictive performance of refactoring prediction by using several feature selection, data sampling techniques, and machine learning techniques.Materials and Methods: We have implemented three feature selection techniques (SIGF (significant features), CFS, INFG (Info Gain),OD (All features)) to select most essential features for improving the performance of refactoring prediction, three data sampling techniques (ADSYN (Adaptive Synthetic Sampling Technique), UPSAM (Upsampling), ORG) for balancing the number of instances of refactored and non-refactored classes. Our refactoring prediction model utilizes four machine learning techniques (Naïve Bayes, K-nearest neighbor (KNN), Decision Tree (DT), and Logistic Regression (LOGR)).Results: Our result analysis shows that DT achieves 98% mean accuracy and 0.92 mean AUC value respectively. All features in the model yield 90.55 mean accuracy and 0.86 mean AUC. Imbalanced data in the model achieves a lower mean AUC value 0.51 but a higher mean accuracy 96.73. As we have considered the imbalanced data set AUC parameter has higher importance than accuracy. UPSAM yields 0.88 mean AUC, which is the best achiever among all.Conclusion: The model with all features achieves better accuracy and AUC performance in comparison with other techniques like SIGF, CFS, INFG. The model with original data achieves better result than the model with different data sampling techniques. Our result analysis shows that DT achieves better mean accuracy and AUC for suggesting the methods to be refactored.

Keywords:
Code refactoring Computer science Feature selection Sampling (signal processing) Data mining Feature (linguistics) Selection (genetic algorithm) Artificial intelligence Machine learning Data modeling Software engineering Programming language Software

Metrics

1
Cited By
0.64
FWCI (Field Weighted Citation Impact)
11
Refs
0.64
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.