An Enhanced Malware Detection Approach using Machine Learning and Feature Selection

Srastika; Nanditha Bhandary; R S Shalakha; Prasad B. Honnavalli; E. Sivaraman

doi:10.1109/icesc54411.2022.9885509

ScienceGate Book Chapters

JOURNAL ARTICLE

An Enhanced Malware Detection Approach using Machine Learning and Feature Selection

Srastika Nanditha Bhandary R S Shalakha Prasad B. Honnavalli E. Sivaraman

Year: 2022 Journal: 2022 3rd International Conference on Electronics and Sustainable Communication Systems (ICESC) Pages: 909-914

DOI: 10.1109/icesc54411.2022.9885509

Get Full-Text PDF Get Analytical Report

Abstract

On the basis of current analysis, the fact that cybersecurity has become a major issue with the advancement of technology cannot be ignored. Malware is one of those serious security adversaries, which can easily evade the prevalent traditional methods of detection such as Signature Matching. Since malware is constantly evolving, it is a herculean task to identify new malware using traditional methods. Machine learning models are more effective in identifying malware. Here, the performance of Artificial Intelligence based models such as Random forest, K-Nearest Neighbours (KNN) and XGBoost with and without the effect of appropriate feature selection algorithms are compared and discussed. Feature selection algorithms which are used here are namely Boruta and Analysis Of Variance (ANOVA). Random Forest leverages Boruta as the feature selector and additionally a feature ranker whereas KNN is coupled with the feature selection algorithm ANOVA. Feature selection is nifty, as it reduces the dimensionality of the dataset significantly. Some redundant features can be successfully eliminated through feature selection without human intervention. Feature selection also reduces overfitting and furthermore renders the model to be more explainable and interpretable. The performance of the mentioned models coupled with feature selection are weighed-up using performance metrics. The varying importance of different features for a malware classification of a Windows portable executable (PE) file over time is also studied. Since malware evolve over time it's important to know the variable responsible for it. Though many machine learning models are able to identify malware it's important that they aid in explainability as well. Thus the paper focuses upon the models and features that additionally improve explainability. The results obtained through experiments show that, with as minimum as 27 features of a portable executable file, a machine learning model can detect malware effectively.

Keywords:

Feature selection Malware Computer science Overfitting Artificial intelligence Random forest Feature (linguistics) Machine learning Feature extraction Curse of dimensionality k-nearest neighbors algorithm Selection (genetic algorithm) Data mining Pattern recognition (psychology) Artificial neural network Computer security

Metrics

Cited By

1.68

FWCI (Field Weighted Citation Impact)

Refs

0.85

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Malware Detection Techniques

Physical Sciences → Computer Science → Signal Processing

Network Security and Intrusion Detection

Physical Sciences → Computer Science → Computer Networks and Communications

Anomaly Detection Techniques and Applications

Physical Sciences → Computer Science → Artificial Intelligence

An Enhanced Malware Detection Approach using Machine Learning and Feature Selection

Abstract

Metrics

Citation History

Topics

Related Documents

A Hybrid Feature Selection Approach-Based Android Malware Detection Framework Using Machine Learning Techniques

Feature Selection-Based Machine Learning Model for Malware Detection

FEATURE SELECTION AND MACHINE LEARNING CLASSIFICATION FOR MALWARE DETECTION

Malware Detection Using Machine Learning Approach

Enhanced Malware Detection Using Machine Learning Algorithms