Lihu PanBingyi LiShouxin PengRui ZhangLinliang Zhang
ABSTRACT Video anomaly detection (VAD), a critical task in intelligent surveillance systems, faces two key challenges: Dynamic behavioral characterization under complex scenarios and robust spatiotemporal context modeling. Existing methods face limitations, such as inadequate cross‐scale feature fusion, weak channel‐wise dependency modeling, and sensitivity to background noise. To address these issues, we propose a novel multi‐scale spatiotemporal feature augmentation framework. Our approach introduces three core innovations: Hierarchical feature pyramid architecture for multi‐granularity representation learning, capturing both local motion patterns and global scene semantics; A channel‐adaptive attention mechanism that dynamically models long‐range spatiotemporal dependencies; A spatiotemporal Gaussian difference module to enhance anomaly response through frequency‐domain feature reconstruction, effectively suppressing noise interference. Extensive experiments on UCSD Ped1/2, CUHK Avenue, and ShanghaiTech benchmarks demonstrate that our method achieves state‐of‐the‐art performance, outperforming existing approaches in both accuracy and robustness.
Le WangJunwen TianSanping ZhouHaoyue ShiGang Hua
Leonardo RossiVittorio BernuzziTomaso FontaniniMassimo BertozziAndrea Prati
K. DeepakS. ChandrakalaC. Krishna Mohan