Anomaly detection in videos is a challenging task due to multiple constraints including the imbalanced nature of anomalies, limited annotated data, and limitation of existing supervised or semi-supervised algorithms to learn in such imbalanced situations. To overcome these issues, Transformer-based unsupervised learning may be a promising solution. In this paper, we proposed a generative transfer learning (GTL) algorithm for video anomaly detection using an unsupervised learning approach. This framework consists of three major parts: (1) a feature extractor; (2) a generator transformer; and (3) a discriminator transformer. The feature extractor generates spatio-temporal features from unlabeled input segments, and passes them to the generator transformer which tries to reconstruct these features. Using cooperative learning between the generator transformer and the discriminator transformer, we train our network so that anomalies have high reconstruction error and non-anomalies do not. We test our proposed model on two well-known anomaly detection datasets (UCF-Crime and ShangaiTech) and report state-of-the-art.
Shenglin ZhangYuhe JiJiaqi LuanXiaohui NieZ.Q. ChenMinghua MaYongqian SunDan Pei
Jiayin ZhouWai Keung WongKaihang JiangPengjie Tan
Bekhzod OlimovBarathi SubramanianJeonghong Kim
Junbo ZhangMengbiao ZhaoFei YinCheng‐Lin Liu