Video anomaly detection (VAD) is essential in safeguarding diverse environments by identifying unexpected events or behaviours within video streams. This task is crucial for security surveillance, industrial monitoring, and public safety. This paper delves into the realm of video anomaly detection, addressing its challenges and exploring novel solutions. The Swin Transformer, a robust architecture originally designed for image classification tasks, is explored in the proposed work. Leveraging its adaptability, the Swin Transformer is used for feature extraction in the proposed model. The model excels in capturing spatial dependencies and temporal nuances within video sequences, enabling accurate anomaly detection. In this work, Weakly supervised learning techniques were implemented as the labelling task instead of the supervised model, which is very expensive and time-consuming. It has been utilized extensively in the field of VAD and has proven fruitful. The experiments were conducted on the UCF-Crime dataset, and the necessary evaluation metrics of the proposed model utilizing the Swin Transformer were calculated and compared with many of the pre-existing video anomaly detection models. The proposed model achieved superior results as compared to many contemporary methods and demonstrates the potential of the Swin Transformer for enhancing the real-world applications of video anomaly detection.
Xiangtao ZhengYichao ZhangYunpeng ZhengFulin LuoXiaoqiang Lu
Tina SamavatAmirhessam YazdiFeng YanLei Yang
Yen‐Wei ChenXiang RuanRahul Kumar Jain