JOURNAL ARTICLE

Malware Detection Using K-Nearest Neighbor Algorithm and Feature Selection

Abstract

Malware is one of the biggest threats in today’s digital era. Malware detection becomes crucial since it can protect devices or systems from the dangers posed by malware, such as data loss/damage, data theft, account break-ins, and the entry of intruders who can gain full access of system. Considering that malware has also evolved from traditional form (monomorphic) to modern form (polymorphic, metamorphic, and oligomorphic), a malware detection system is needed that is no longer signature-based, but rather machine learning-based. This research will discuss malware detection by classifying the file whether considered as malware or goodware, using one of the classification algorithms in machine learning, namely k-Nearest Neighbor (kNN). To improve the performance of kNN, the number of features was reduced using the Information Gain and Principal Component Analysis (PCA) feature selection methods. The performance of kNN with PCA and Information Gain will then be compared to get the best performance. As a result, by using the PCA method where the number of features was reduced until the remaining 32 PCs, the kNN algorithm succeeded in maintaining classification performance with an accuracy of 95.6% and an F1-Score of 95.6%. Using the same number of features as the basis, the Information Gain method is applied by sorting the features from those with the highest Information Gain score and taking the 32 best features. The result, by using this Information Gain method, the classification performance of the kNN algorithm can be increased to 96.9% for both accuracy and F1-Score.

Keywords:
Malware Computer science Feature selection Information gain ratio k-nearest neighbors algorithm Artificial intelligence Information gain Feature (linguistics) Data mining Machine learning Pattern recognition (psychology) Sorting Principal component analysis Algorithm Computer security

Metrics

5
Cited By
3.56
FWCI (Field Weighted Citation Impact)
28
Refs
0.88
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Malware Detection Techniques
Physical Sciences →  Computer Science →  Signal Processing
Information Retrieval and Data Mining
Physical Sciences →  Computer Science →  Information Systems
Multimedia Learning Systems
Physical Sciences →  Computer Science →  Information Systems

Related Documents

JOURNAL ARTICLE

Feature Selection for Multiple K-Nearest Neighbor classifiers using GAVaPS

Heesung LeeJae-Hun LeeEuntai Kim

Journal:   Journal of Korean institute of intelligent systems Year: 2008 Vol: 18 (6)Pages: 871-875
JOURNAL ARTICLE

Predict the diagnosis of heart disease using feature selection and k-nearest neighbor algorithm

Kittipol Wisaeng

Journal:   Applied Mathematical Sciences Year: 2014 Vol: 8 Pages: 4103-4113
© 2026 ScienceGate Book Chapters — All rights reserved.