SrastikaNanditha BhandaryR S ShalakhaPrasad B. HonnavalliE. Sivaraman
On the basis of current analysis, the fact that cybersecurity has become a major issue with the advancement of technology cannot be ignored. Malware is one of those serious security adversaries, which can easily evade the prevalent traditional methods of detection such as Signature Matching. Since malware is constantly evolving, it is a herculean task to identify new malware using traditional methods. Machine learning models are more effective in identifying malware. Here, the performance of Artificial Intelligence based models such as Random forest, K-Nearest Neighbours (KNN) and XGBoost with and without the effect of appropriate feature selection algorithms are compared and discussed. Feature selection algorithms which are used here are namely Boruta and Analysis Of Variance (ANOVA). Random Forest leverages Boruta as the feature selector and additionally a feature ranker whereas KNN is coupled with the feature selection algorithm ANOVA. Feature selection is nifty, as it reduces the dimensionality of the dataset significantly. Some redundant features can be successfully eliminated through feature selection without human intervention. Feature selection also reduces overfitting and furthermore renders the model to be more explainable and interpretable. The performance of the mentioned models coupled with feature selection are weighed-up using performance metrics. The varying importance of different features for a malware classification of a Windows portable executable (PE) file over time is also studied. Since malware evolve over time it's important to know the variable responsible for it. Though many machine learning models are able to identify malware it's important that they aid in explainability as well. Thus the paper focuses upon the models and features that additionally improve explainability. The results obtained through experiments show that, with as minimum as 27 features of a portable executable file, a machine learning model can detect malware effectively.
Santosh K. SmmarwarGovind P. GuptaSanjay Kumar
Ban Mohammed KhammasAlireza MonemiJoseph Stephen BassiIsmahani IsmailSulaiman Mohd NorMuhammad Nadzir Marsono
Naresh Babu MuppalaneniRipon Patgiri
Naveen Sundar Kumar PVeera Prasad SingiriSujatha PerapoguYasaswini KunamVamse Krishna Mallela