Y.-W. WangYang ChenChai Kiat Yeo
Surveillance cameras are extensively deployed across public and private environments, driving the need for intelligent video monitoring systems. However, a major challenge arises in Weakly Supervised Video Anomaly Detection (WSVAD), where supervision is limited to video-level labels, making snippet-level anomaly localisation particularly difficult. This challenge is often formulated as a Multiple Instance Learning (MIL) problem. Although recent approaches have achieved encouraging results by modelling spatio-temporal dynamics, they often overlook the semantic information within videos that could further enhance anomaly detection. To bridge this gap, we propose enriching feature representations by applying object detection techniques to extract object-centric features. These features provide supplementary high-level semantic information that supports the discrimination of anomalous events. Experiments conducted on two benchmark datasets, UCF-Crime and ShanghaiTech, demonstrate that our approach achieves performance comparable to state-of-the-art (SOTA) methods. The results highlight that incorporating object-level semantics offers a promising direction for improving WSVAD, underscoring the potential of semantic-aware approaches for more effective anomaly detection.
Ali Enver BilecenHüseyin Özkan
Francisco CaetanoPedro CarvalhoChristina MastralexiJaime S. Cardoso
Weikang WangYuting SuJing LiuWei SunGuangtao Zhai
Huixin WuMengfan YangFupeng WeiGe ShiWei JiangYaqiong QiaoHangcheng Dong