Encrypted HTTPS traffic now dominates the Internet, and malware increasingly uses TLS to conceal command-and-control activity. Since payloads cannot be inspected, detection must rely on metadata such as TLS handshake fields and certificate attributes, which prior work has shown can still reveal malicious behavior. This research evaluates whether malicious HTTPS connections can be detected using only metadata from Zeek logs. Using the CTU-SME-11 dataset, we build a reproducible preprocessing pipeline and a 33-feature connection-level representation capturing flow statistics, TLS behavior, and certificate validity characteristics. We evaluate XGBoost, multilayer perceptrons, and several CNN variants - including 1D and 2D grid-based embeddings - using a stratified capture-level split and 5-fold capture-aware cross-validation to prevent leakage. Results show strong discriminative performance, with XGBoost achieving the highest ROC-AUC and PR-AUC, and CNN-based models, particularly an 8×8 architecture, achieving the strongest malicious-class F1-scores. These findings show that metadata-based models can accurately detect encrypted malicious traffic and motivate future work on generalization, calibration and explainability.
Xinyu LiuRuijie ZhaoMing LiuLibo ChenLingyun YingZhengguang HanZhi Xue
Yuecheng WenXiaohui HanWenbo ZuoWeihua Liu
Tianyi WangHaifeng WangWenbin WangZuocong ChenBowen YeManfang Dou