JOURNAL ARTICLE

Enhancing Software Defect Prediction: HHO-Based Wrapper Feature Selection with Ensemble Methods

Achmad Fauzan LuthfiRudy HertenoFriska AbadiRadityo Adi NugrohoMuhammad Itqan MazdadiVijay Anant Athavale

Year: 2025 Journal:   Indonesian Journal of Electronics Electromedical Engineering and Medical Informatics Vol: 7 (2)Pages: 188-202

Abstract

The growing complexity of data across domains highlights the need for effective classification models capable of addressing issues such as class imbalance and feature redundancy. The NASA MDP dataset poses such challenges due to its diverse characteristics and highly imbalanced classes, which can significantly affect model accuracy. This study proposes a robust classification framework integrating advanced preprocessing, optimization-based feature selection, and ensemble learning techniques to enhance predictive performance. The preprocessing phase involved z-score standardization and robust scaling to normalize data while reducing the impact of outliers. To address class imbalance, the ADASYN technique was employed. Feature selection was performed using Binary Harris Hawk Optimization (BHHO), with K-Nearest Neighbor (KNN) used as an evaluator to determine the most relevant features. Classification models including Random Forest (RF), Support Vector Machine (SVM), and Stacking were evaluated using performance metrics such as accuracy, AUC, precision, recall, and F1-measure. Experimental results indicated that the Stacking model achieved superior performance in several datasets, with the MC1 dataset yielding an accuracy of 0.998 and an AUC of 1.000. However, statistical significance testing revealed that not all observed improvements were meaningful; for example, Stacking significantly outperformed SVM but did not show a significant difference when compared to RF in terms of AUC. This underlines the importance of aligning model choice with dataset characteristics. In conclusion, the integration of advanced preprocessing and metaheuristic optimization contributes positively to software defect prediction. Future research should consider more diverse datasets, alternative optimization techniques, and explainable AI to further enhance model reliability and interpretability.

Keywords:
Feature selection Selection (genetic algorithm) Software bug Computer science Software Artificial intelligence Feature (linguistics) Pattern recognition (psychology) Data mining Machine learning

Metrics

1
Cited By
9.66
FWCI (Field Weighted Citation Impact)
0
Refs
0.93
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Software Engineering Research
Physical Sciences →  Computer Science →  Information Systems
Software Reliability and Analysis Research
Physical Sciences →  Computer Science →  Software
Software Engineering Techniques and Practices
Physical Sciences →  Computer Science →  Information Systems
© 2026 ScienceGate Book Chapters — All rights reserved.