We present a Transformer-based intrusion detection system (IDS) for IoT network flows. Raw traffic is converted into windowed flow sequences (47 features; 30-s window; 10-s stride; sequence length 64) and fed to a compact Transformer encoder (4 layers, 8 heads, hidden size 256) with dual heads for binary (anomaly) and multiclass (attack type) inference. Evaluated on UNSW-NB15, BoT-IoT, and ToN_IoT against CNN, LSTM, Random Forest, and SVM baselines, the model achieves state-of-the-art discrimination with lower false-alarm behavior: UNSW-NB15: F1 = 95.1%, FAR = 2.1%, ROC-AUC = 0.984; BoT-IoT: F1 = 97.2%, FAR = 1.4%, ROC-AUC = 0.992; ToN_IoT: F1 = 92.9%, FAR = 2.6%, ROC-AUC = 0.973. Precision–Recall analysis confirms higher PR-AUC and better precision at matched recall than all baselines, which aligns with fewer benign flows escalated as alerts. Attention maps and SHAP attributions surface feature-time drivers (e.g., SYN bursts, DNS probing, TLS exfiltration cues) and are distilled into short reason codes attached to each alert. A deployment-oriented alert policy (default threshold with abstain band, 2-of-3 window aggregation, session de-duplication, and rate limiting) turns scores into compact, auditable outputs suitable for operations
Junyao FengChao-hong WangHao XueLijun Zhang
R LalduhsakaNilutpol BoraAjoy Kumar Khan
Jorge Casajús-SetiénConcha BielzaPedro Larrañaga