JOURNAL ARTICLE

An Enhanced Malware Detection Approach using Machine Learning and Feature Selection

SrastikaNanditha BhandaryR S ShalakhaPrasad B. HonnavalliE. Sivaraman

Year: 2022 Journal:   2022 3rd International Conference on Electronics and Sustainable Communication Systems (ICESC) Pages: 909-914

Abstract

On the basis of current analysis, the fact that cybersecurity has become a major issue with the advancement of technology cannot be ignored. Malware is one of those serious security adversaries, which can easily evade the prevalent traditional methods of detection such as Signature Matching. Since malware is constantly evolving, it is a herculean task to identify new malware using traditional methods. Machine learning models are more effective in identifying malware. Here, the performance of Artificial Intelligence based models such as Random forest, K-Nearest Neighbours (KNN) and XGBoost with and without the effect of appropriate feature selection algorithms are compared and discussed. Feature selection algorithms which are used here are namely Boruta and Analysis Of Variance (ANOVA). Random Forest leverages Boruta as the feature selector and additionally a feature ranker whereas KNN is coupled with the feature selection algorithm ANOVA. Feature selection is nifty, as it reduces the dimensionality of the dataset significantly. Some redundant features can be successfully eliminated through feature selection without human intervention. Feature selection also reduces overfitting and furthermore renders the model to be more explainable and interpretable. The performance of the mentioned models coupled with feature selection are weighed-up using performance metrics. The varying importance of different features for a malware classification of a Windows portable executable (PE) file over time is also studied. Since malware evolve over time it's important to know the variable responsible for it. Though many machine learning models are able to identify malware it's important that they aid in explainability as well. Thus the paper focuses upon the models and features that additionally improve explainability. The results obtained through experiments show that, with as minimum as 27 features of a portable executable file, a machine learning model can detect malware effectively.

Keywords:
Feature selection Malware Computer science Overfitting Artificial intelligence Random forest Feature (linguistics) Machine learning Feature extraction Curse of dimensionality k-nearest neighbors algorithm Selection (genetic algorithm) Data mining Pattern recognition (psychology) Artificial neural network Computer security

Metrics

12
Cited By
1.68
FWCI (Field Weighted Citation Impact)
13
Refs
0.85
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Malware Detection Techniques
Physical Sciences →  Computer Science →  Signal Processing
Network Security and Intrusion Detection
Physical Sciences →  Computer Science →  Computer Networks and Communications
Anomaly Detection Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.