JOURNAL ARTICLE

Komparasi Performa Tree-Based Classifier Untuk Deteksi Anomali Pada Data Berdimensi Tinggi dan Tidak Seimbang

Abstract

Anomaly detection is one solution to overcome the issue of data network traffic security, but is faced with the challenge of high data dimensionality and imbalanced data. High-dimensional and imbalanced data can affect the performance of the detection system. Therefore we need a feature selection technique that can reduce the dimensionality of the data by eliminating irrelevant features. In addition, the selected features need to be validated with the right classification algorithm to produce high anomaly detection performance. The purpose of this study is to produce a combination of feature selection techniques and appropriate classification algorithms to produce a system that is able to detect attacks on high-dimensional and imbalanced data. Chi-square feature selection technique was used to eliminate irrelevant features. To determine the ideal classification algorithm, in this study, a comparison of the performance of the tree-based classifer algorithm was carried out. This study also examines the performance of classification techniques in detecting traffic on high-dimensional and unbalanced data. Several Tree-based classification algorithms such as REPTree, J48, Random Tree and Random Forest were tested and compared. Testing with the best performance as a recommendation for the ideal combination of feature selection techniques and classification algorithms. This research produces an anomaly detection system that has high performance. For experimental data, the CICIDS-2017 dataset is used, which has high data dimensionality and contains unbalanced data. The test results show that Random Tree has an accuracy of 99.983% and Random Forest 99.984%.

Keywords:
Computer science Random forest Feature selection C4.5 algorithm Pattern recognition (psychology) Artificial intelligence Statistical classification Curse of dimensionality Dimensionality reduction Data mining Classifier (UML) Random projection Support vector machine Naive Bayes classifier

Metrics

2
Cited By
0.76
FWCI (Field Weighted Citation Impact)
13
Refs
0.72
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Data Mining and Machine Learning Applications
Physical Sciences →  Computer Science →  Information Systems
Network Security and Intrusion Detection
Physical Sciences →  Computer Science →  Computer Networks and Communications
Information Retrieval and Data Mining
Physical Sciences →  Computer Science →  Information Systems

Related Documents

JOURNAL ARTICLE

KOMPARASI METODE SMOTE DAN ADASYN UNTUK PENANGANAN DATA TIDAK SEIMBANG MULTICLASS

Fandi Yulian PamujiSephia Dwi Arma Putri

Journal:   Jurnal Informatika Polinema Year: 2023 Vol: 9 (3)Pages: 331-338
JOURNAL ARTICLE

Komparasi Algoritma Kasifikasi dengan Pendekatan Level Data Untuk Menangani Data Kelas Tidak Seimbang

Ahmad Ilham

Journal:   JURNAL ILMIAH ILMU KOMPUTER Year: 2017 Vol: 3 (1)Pages: 1-6
JOURNAL ARTICLE

ALGORITMA PRINCIPAL COMPONENT ANALYSIS UNTUK MENINGKATKAN PERFORMA FUZZY C-MEANS PADA KLASTERISASI DATASET BERDIMENSI TINGGI

Agung RiyadiFauziah Fauziah

Journal:   Jurnal Ilmiah Teknologi dan Rekayasa Year: 2024 Vol: 29 (2)Pages: 99-115
JOURNAL ARTICLE

Algoritma Swarm Intelligence untuk Data Berdimensi Tinggi pada Machine Learning: Review

Joan Angelina WidiansAde Fiqri Tjikoa

Journal:   Jurnal Rekayasa Teknologi Informasi (JURTI) Year: 2024 Vol: 8 (1)Pages: 11-11
JOURNAL ARTICLE

Hybrid Ensemble Learning Sistem Keamanan Jaringan Untuk Meningkatkan Performa Deteksi Anomali

Rony Heri Irawan IrawanNico Adi SaputraUmi Mahdiyah

Journal:   Nusantara of Engineering (NOE) Year: 2025 Vol: 8 (02)Pages: 361-369
© 2026 ScienceGate Book Chapters — All rights reserved.