JOURNAL ARTICLE

Enhancing malware detection with feature selection and scaling techniques using machine learning models

Abstract

Abstract The increasing prevalence of malware presents a critical challenge to cybersecurity, emphasizing the need for robust detection methods. This study uses a binary tabular classification dataset to evaluate the impact of feature selection, feature scaling, and machine learning (ML) models on malware detection. The methodology involves experimenting with three feature scaling techniques (no scaling, normalization, and min-max scaling), three feature selection methods (no selection, Linear Discriminant Analysis (LDA), and Principal Component Analysis (PCA)), and twelve ML models, including traditional algorithms and ensemble methods. A publicly available dataset with 11,598 samples and 139 features is utilized, and model performance is assessed using metrics such as accuracy, precision, recall, F1-score, and AUC-ROC. Results reveal that the Light Gradient Boosting Machine (LGBM) achieves the highest accuracy of 97.16% when PCA and either min-max scaling or normalization are applied. Additionally, ensemble models consistently outperform traditional ML models, demonstrating their effectiveness in enhancing malware detection. These findings offer valuable insights into optimizing preprocessing and model selection strategies for developing reliable and efficient malware detection systems.

Keywords:
Feature selection Computer science Normalization (sociology) Artificial intelligence Preprocessor Linear discriminant analysis Machine learning Malware Boosting (machine learning) Gradient boosting Pattern recognition (psychology) Data mining Model selection Principal component analysis Data pre-processing Feature (linguistics) Scaling Random forest Mathematics

Metrics

22
Cited By
117.19
FWCI (Field Weighted Citation Impact)
100
Refs
1.00
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Malware Detection Techniques
Physical Sciences →  Computer Science →  Signal Processing
Network Security and Intrusion Detection
Physical Sciences →  Computer Science →  Computer Networks and Communications
Anomaly Detection Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

A Static Feature Selection-based Android Malware Detection Using Machine Learning Techniques

Aviral SangalHarsh Kumar Verma

Journal:   2020 International Conference on Smart Electronics and Communication (ICOSEC) Year: 2020 Pages: 48-51
JOURNAL ARTICLE

Android malware detection applying feature selection techniques and machine learning

Mohammad Reza KeyvanpourMehrnoush Barani ShirzadFarideh Heydarian

Journal:   Multimedia Tools and Applications Year: 2022 Vol: 82 (6)Pages: 9517-9531
BOOK-CHAPTER

Enhancing Obfuscated Malware Detection with Machine Learning Techniques

Quang-Vinh Dang

Communications in computer and information science Year: 2022 Pages: 731-738
© 2026 ScienceGate Book Chapters — All rights reserved.