JOURNAL ARTICLE

WSVAD-CLIP: Temporally Aware and Prompt Learning with CLIP for Weakly Supervised Video Anomaly Detection

Min LiJing SangYuanyao LuLina Du

Year: 2025 Journal:   Journal of Imaging Vol: 11 (10)Pages: 354-354   Publisher: Multidisciplinary Digital Publishing Institute

Abstract

Weakly Supervised Video Anomaly Detection (WSVAD) is a critical task in computer vision. It aims to localize and recognize abnormal behaviors using only video-level labels. Without frame-level annotations, it becomes significantly challenging to model temporal dependencies. Given the diversity of abnormal events, it is also difficult to model semantic representations. Recently, the cross-modal pre-trained model Contrastive Language-Image Pretraining (CLIP) has shown a strong ability to align visual and textual information. This provides new opportunities for video anomaly detection. Inspired by CLIP, WSVAD-CLIP is proposed as a framework that uses its cross-modal knowledge to bridge the semantic gap between text and vision. First, the Axial-Graph (AG) Module is introduced. It combines an Axial Transformer and Lite Graph Attention Networks (LiteGAT) to capture global temporal structures and local abnormal correlations. Second, a Text Prompt mechanism is designed. It fuses a learnable prompt with a knowledge-enhanced prompt to improve the semantic expressiveness of category embeddings. Third, the Abnormal Visual-Guided Text Prompt (AVGTP) mechanism is proposed to aggregate anomalous visual context for adaptively refining textual representations. Extensive experiments on UCF-Crime and XD-Violence datasets show that WSVAD-CLIP notably outperforms existing methods in coarse-grained anomaly detection. It also achieves superior performance in fine-grained anomaly recognition tasks, validating its effectiveness and generalizability.

Keywords:

Metrics

2
Cited By
9.64
FWCI (Field Weighted Citation Impact)
38
Refs
0.97
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Anomaly Detection Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
Network Security and Intrusion Detection
Physical Sciences →  Computer Science →  Computer Networks and Communications
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.