JOURNAL ARTICLE

CLIP-TSA: Clip-Assisted Temporal Self-Attention for Weakly-Supervised Video Anomaly Detection

Abstract

Video anomaly detection (VAD) – commonly formulated as a multiple-instance learning problem in a weakly-supervised manner due to its labor-intensive nature – is a challenging problem in video surveillance where the frames of anomaly need to be localized in an untrimmed video. In this paper, we first propose to utilize the ViT-encoded visual features from CLIP, in contrast with the conventional C3D or I3D features in the domain, to efficiently extract discriminative representations in the novel technique. We then model temporal dependencies and nominate the snippets of interest by leveraging our proposed Temporal Self-Attention (TSA). The ablation study confirms the effectiveness of TSA and ViT feature. The extensive experiments show that our proposed CLIP-TSA outperforms the existing state-of-the-art (SOTA) methods by a large margin on three commonly-used benchmark datasets in the VAD problem (UCF-Crime, ShanghaiTech Campus and XD-Violence). Our source code is available at https://github.com/joos2010kj/CLIP-TSA.

Keywords:
Computer science Discriminative model Margin (machine learning) Benchmark (surveying) Anomaly detection Artificial intelligence Feature (linguistics) Anomaly (physics) Code (set theory) Pattern recognition (psychology) Machine learning

Metrics

63
Cited By
16.09
FWCI (Field Weighted Citation Impact)
52
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Anomaly Detection Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
Network Security and Intrusion Detection
Physical Sciences →  Computer Science →  Computer Networks and Communications
Artificial Immune Systems Applications
Physical Sciences →  Engineering →  Biomedical Engineering

Related Documents

JOURNAL ARTICLE

CLIP: Assisted Video Anomaly Detection

Dong Meng

Year: 2024 Pages: 522-533
JOURNAL ARTICLE

Inter-Clip Feature Similarity Based Weakly Supervised Video Anomaly Detection via Multi-Scale Temporal MLP

Yuanhong ZhongRan ZhuGe YanPing GanXingfu ShenDong Zhu

Journal:   IEEE Transactions on Circuits and Systems for Video Technology Year: 2024 Vol: 35 (2)Pages: 1961-1970
JOURNAL ARTICLE

Weakly supervised video anomaly detection with temporal attention module

Wonjoon SongJonghyun KimJoongkyu Kim

Journal:   2022 37th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC) Year: 2022 Pages: 1-4
© 2026 ScienceGate Book Chapters — All rights reserved.