Feature Selection for High-Dimensional Imbalanced Malware Data Using Filter and Wrapper Selection Methods

Mai Abu-Jazoh; Duha Al-Darras; Nesreen A. Hamad; Saleh Al-Sharaeh

doi:10.1109/icit58056.2023.10226049

ScienceGate Book Chapters

JOURNAL ARTICLE

Feature Selection for High-Dimensional Imbalanced Malware Data Using Filter and Wrapper Selection Methods

Mai Abu-Jazoh Duha Al-Darras Nesreen A. Hamad Saleh Al-Sharaeh

Year: 2023 Pages: 196-201

DOI: 10.1109/icit58056.2023.10226049

Get Full-Text PDF Get Analytical Report

Abstract

Feature selection is a vital preprocessing step before utilizing any machine learning algorithm. It aims at reducing the number of features in the dataset by removing irrelevant, noisy, and redundant features. The feature selection problem can be viewed as an optimization problem where the goal is to maximize or minimize an evaluation measure for the machine learning tasks, mainly classification tasks. Metaheuristic algorithms are optimization algorithms that can be applied to feature selection. In this research, a comparison between the wrapper feature selection model based on the Differential Evolution (DE) and filter methods like Chi2 and ReliefF is conducted to evaluate both approaches. Three classification algorithms k-Nearest Neighbors (KNN), Support Vector Machines (SVM), and Decision Trees (DT) are used to evaluate the utilized feature selection algorithms. The proposed model is tested on a recent malware dataset obtained from the UCI repository. The results show that DT achieves the highest accuracy and consistently performs well in both wrapper and filter feature selection techniques. Thus, DT can be considered the most effective algorithm for the given dataset. However, SVM and KNN also offer viable alternatives depending on specific requirements or preferences.

Keywords:

Feature selection Computer science Support vector machine Artificial intelligence Machine learning Filter (signal processing) Preprocessor Feature (linguistics) Data mining Data pre-processing Selection (genetic algorithm) Malware Pattern recognition (psychology) k-nearest neighbors algorithm Feature extraction

Metrics

Cited By

1.34

FWCI (Field Weighted Citation Impact)

Refs

0.77

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Malware Detection Techniques

Physical Sciences → Computer Science → Signal Processing

Network Security and Intrusion Detection

Physical Sciences → Computer Science → Computer Networks and Communications

Artificial Immune Systems Applications

Physical Sciences → Engineering → Biomedical Engineering

Feature Selection for High-Dimensional Imbalanced Malware Data Using Filter and Wrapper Selection Methods

Abstract

Metrics

Citation History

Topics

Related Documents

Feature Selection on High Dimensional Data Using Wrapper Based Subset Selection

Feature selection for high-dimensional imbalanced data

Feature Selection with High-Dimensional Imbalanced Data

Hybrid Feature Selection by Combining Wrapper and Filter Methods for Malware Detection

Double Filter and Double Wrapper Feature Selection Algorithm for High-Dimensional Data Analysis