Dasheng ZhangChao HuangChengliang LiuYong Xu
Weakly supervised video anomaly detection is a challenging problem due to the lack of refined frame-level labels in training videos. Most prior works typically address it with the multiple instance learning paradigm, which divides a video into multiple snippets and trains a snippet classifier to distinguish anomalies from normal snippets via video-level classification loss. However, these solutions are limited in the insufficient representations. In this paper, we propose a novel weakly supervised temporal relation learning framework for anomaly detection, which efficiently explores the temporal relation between snippets and enhances the discriminative powers of features using only video-level labelled videos. To this end, we design a transformer-enabled feature encoder to convert the input task-agnostic features into discriminative task-specific features by mining the semantic similarity and position relation between snippets. As a result, our model can make a more accurate anomaly detection for current video snippet based on the learned discriminative features. Experimental results indicate that the proposed method is superior to existing state-of-the-art approaches, which demonstrates the effectiveness of our model.
Chao HuangChengliang LiuJie WenLian WuYong XuQiuping JiangYaowei Wang
Shengjun PengYiheng CaiZijun YaoMeiling Tan
Lin YuanXun DuanGuangqian KongHuiyun Long
Yu TianGuansong PangYuanhong ChenRajvinder SinghJohan VerjansGustavo Carneiro