JOURNAL ARTICLE

Machine Learning for Encrypted Malware Traffic Classification

Abstract

The application of machine learning for the detection of malicious network traffic has been well researched over the past several decades; it is particularly appealing when the traffic is encrypted because traditional pattern-matching approaches cannot be used. Unfortunately, the promise of machine learning has been slow to materialize in the network security domain. In this paper, we highlight two primary reasons why this is the case: inaccurate ground truth and a highly non-stationary data distribution. To demonstrate and understand the effect that these pitfalls have on popular machine learning algorithms, we design and carry out experiments that show how six common algorithms perform when confronted with real network data. With our experimental results, we identify the situations in which certain classes of algorithms underperform on the task of encrypted malware traffic classification. We offer concrete recommendations for practitioners given the real-world constraints outlined. From an algorithmic perspective, we find that the random forest ensemble method outperformed competing methods. More importantly, feature engineering was decisive; we found that iterating on the initial feature set, and including features suggested by domain experts, had a much greater impact on the performance of the classification system. For example, linear regression using the more expressive feature set easily outperformed the random forest method using a standard network traffic representation on all criteria considered. Our analysis is based on millions of TLS encrypted sessions collected over 12 months from a commercial malware sandbox and two geographically distinct, large enterprise networks.

Keywords:
Sandbox (software development) Computer science Traffic classification Encryption Malware Machine learning Random forest Feature engineering Artificial intelligence Feature (linguistics) Domain (mathematical analysis) Feature learning Data mining Set (abstract data type) Representation (politics) Deep learning The Internet Computer security

Metrics

251
Cited By
14.21
FWCI (Field Weighted Citation Impact)
41
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Internet Traffic Analysis and Secure E-voting
Physical Sciences →  Computer Science →  Artificial Intelligence
Network Security and Intrusion Detection
Physical Sciences →  Computer Science →  Computer Networks and Communications
Advanced Malware Detection Techniques
Physical Sciences →  Computer Science →  Signal Processing

Related Documents

JOURNAL ARTICLE

Analyzing Learning-based Encrypted Malware Traffic Classification with AutoML

Didier Frank IsingizweMeng WangWenmao LiuDongsheng WangTiejun WuJun Li

Journal:   2021 IEEE 21st International Conference on Communication Technology (ICCT) Year: 2021 Pages: 313-322
JOURNAL ARTICLE

Malware Detection in Encrypted TLS Traffic using Machine Learning Techniques

Deok-Jo JeonDong-Gue Park

Journal:   The Journal of Korean Institute of Information Technology Year: 2021 Vol: 19 (10)Pages: 125-136
JOURNAL ARTICLE

Encrypted network traffic classification based on machine learning

Reham Taher El-MaghrabyNada M. Abdel AziemMohamed SobhAyman M. Bahaa-Eldin

Journal:   Ain Shams Engineering Journal Year: 2023 Vol: 15 (2)Pages: 102361-102361
© 2026 ScienceGate Book Chapters — All rights reserved.