Fupeng WeiYibo JiaoNan WangKai ZhengGe ShiMengfan YangZhao We
The detection of abnormal behavior has consistently garnered significant attention. Conventional methods employ vision-based dual-stream networks or 3D convolutions to represent spatio-temporal information in video sequences to identify normal and pathological behaviors. Nonetheless, these methodologies generally employ datasets balanced across data categories and consist solely of two classifications. In actuality, anomalous behaviors frequently display multi-category characteristics, with each category’s distribution demonstrating a pronounced long-tail phenomenon. This paper presents a video-based technique for detecting multi-category abnormal behavior, termed the Spatio-Temporal Fusion–Temporal Difference Network (STF-TDN). The system first employs a temporal difference network (TDN) model to encapsulate movie temporal dynamics via local and global modeling. To enhance recognition performance, this study develops a feature fusion module—Spatial-Temporal Fusion (STF)—which augments the model’s representational capacity by amalgamating spatial and temporal data. Furthermore, given the long-tailed distribution characteristics of the datasets, this study employs focused loss rather than the conventional cross-entropy loss function to enhance the model’s recognition capability for under-represented categories. We perform comprehensive experiments and ablation studies on two datasets. Precision is 96.3% for the Violence5 dataset and 87.5% for the RWF-2000 dataset. The results of the experiment indicate the enhanced efficacy of the proposed strategy in detecting anomalous behavior.
Hai Chuan LiuJoon Huang ChuahAnis Salwa Mohd KhairuddinXian Min ZhaoXiao Dan Wang
Bo WangMao YeXue LiFengjuan Zhao
Dongliang JinSonghao ZhuXian SunZhiwei LiangGuozheng Xu
Zhilei LiJun LiYuqing MaRui WangZhiping ShiYifu DingXianglong Liu