JOURNAL ARTICLE

Adaptive Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization

Yuanhao ZhaiLe WangWei TangQilin ZhangNanning ZhengDavid DoermannJunsong YuanGang Hua

Year: 2022 Journal:   IEEE Transactions on Pattern Analysis and Machine Intelligence Vol: 45 (4)Pages: 4136-4151   Publisher: IEEE Computer Society

Abstract

Weakly-supervised temporal action localization (W-TAL) aims to classify and localize all action instances in untrimmed videos under only video-level supervision. Without frame-level annotations, it is challenging for W-TAL methods to clearly distinguish actions and background, which severely degrades the action boundary localization and action proposal scoring. In this paper, we present an adaptive two-stream consensus network (A-TSCN) to address this problem. Our A-TSCN features an iterative refinement training scheme: a frame-level pseudo ground truth is generated and iteratively updated from a late-fusion activation sequence, and used to provide frame-level supervision for improved model training. Besides, we introduce an adaptive attention normalization loss, which adaptively selects action and background snippets according to video attention distribution. By differentiating the attention values of the selected action snippets and background snippets, it forces the predicted attention to act as a binary selection and promotes the precise localization of action boundaries. Furthermore, we propose a video-level and a snippet-level uncertainty estimator, and they can mitigate the adverse effect caused by learning from noisy pseudo ground truth. Experiments conducted on the THUMOS14, ActivityNet v1.2, ActivityNet v1.3, and HACS datasets show that our A-TSCN outperforms current state-of-the-art methods, and even achieves comparable performance with several fully-supervised methods.

Keywords:
Computer science Ground truth Artificial intelligence Machine learning Frame (networking) Snippet Action (physics) Pattern recognition (psychology) Information retrieval

Metrics

17
Cited By
2.10
FWCI (Field Weighted Citation Impact)
71
Refs
0.86
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Anomaly Detection Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
Diabetic Foot Ulcer Assessment and Management
Health Sciences →  Medicine →  Endocrinology, Diabetes and Metabolism
© 2026 ScienceGate Book Chapters — All rights reserved.