Ummay Maria MunaShanta BiswasSyed Abu Ammar Muhammad ZarifPhilip Jefferson DeoriTauseef TajwarSwakkhar Shatabda
Automated video anomaly detection (VAD) is a challenging task due to its context-dependent and sporadic nature. However, recent deep learning advancements offer promising solutions. In this paper, we propose a novel framework for detecting anomalies in videos by uniquely analyzing spatial and temporal (spatio-temporal) features. We address challenges such as the processing of lengthy videos and the sparse occurrence of anomalies by segmenting and labeling anomalous parts within videos. We employ a modified pre-trained vision transformer for video feature extraction, leveraging its ability to capture complex spatio-temporal patterns and the global context. Additionally, we incorporate a parameter-efficient recurrent model, the Simple Recurrent Unit Plus Plus (SRU++), which processes long sequential video embeddings efficiently by reducing computational costs by ten times compared to traditional methods. To further enhance the multiclass prediction performance, we develop a cluster-based weighting mechanism that assigns weights to classification scores based on feature similarity. We extensively evaluated our approach on three popular datasets — UCF-Crime, RWF-2000, and Smart City CCTV Violence Detection (SCVD) — achieving superior performance compared to state-of-the-art methods, making it well-suited for real-world surveillance applications.
Hongchun YuanZhenyu CaiHui ZhouYue WangXiangzhi Chen
R KarthikS AdithyaP ShalmiyaV. Subramaniyaswamy
Biao GuoMingrui LiuQian HeMing Jiang
Shimpei KobayshiAkiyoshi HizukuriRyohei Nakayama