ACSNet: Action-Context Separation Network for Weakly Supervised Temporal Action Localization

Ziyi Liu; Le Wang; Qilin Zhang; Wei Tang; Junsong Yuan; Nanning Zheng; Gang Hua

doi:10.1609/aaai.v35i3.16322

ScienceGate Book Chapters

JOURNAL ARTICLE

ACSNet: Action-Context Separation Network for Weakly Supervised Temporal Action Localization

Ziyi Liu Le Wang Qilin Zhang Wei Tang Junsong Yuan Nanning Zheng Gang Hua

Year: 2021 Journal: Proceedings of the AAAI Conference on Artificial Intelligence Vol: 35 (3)Pages: 2233-2241 Publisher: Association for the Advancement of Artificial Intelligence

DOI: 10.1609/aaai.v35i3.16322

Get Full-Text PDF Get Analytical Report

Abstract

The object of Weakly-supervised Temporal Action Localization (WS-TAL) is to localize all action instances in an untrimmed video with only video-level supervision. Due to the lack of frame-level annotations during training, current WS-TAL methods rely on attention mechanisms to localize the foreground snippets or frames that contribute to the video-level classification task. This strategy frequently confuse context with the actual action, in the localization result. Separating action and context is a core problem for precise WS-TAL, but it is very challenging and has been largely ignored in the literature. In this paper, we introduce an Action-Context Separation Network (ACSNet) that explicitly takes into account context for accurate action localization. It consists of two branches (i.e., the Foreground-Background branch and the Action-Context branch). The Foreground-Background branch first distinguishes foreground from background within the entire video while the Action-Context branch further separates the foreground as action and context. We associate video snippets with two latent components (i.e., a positive component and a negative component), and their different combinations can effectively characterize foreground, action and context. Furthermore, we introduce extended labels with auxiliary context categories to facilitate the learning of action-context separation. Experiments on THUMOS14 and ActivityNet v1.2/v1.3 datasets demonstrate the ACSNet outperforms existing state-of-the-art WS-TAL methods by a large margin.

Keywords:

Action (physics) Context (archaeology) Computer science Artificial intelligence Margin (machine learning) Task (project management) Object (grammar) Frame (networking) Component (thermodynamics) Pattern recognition (psychology) Computer vision Machine learning

Metrics

Cited By

4.12

FWCI (Field Weighted Citation Impact)

Refs

0.96

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Anomaly Detection Techniques and Applications

Physical Sciences → Computer Science → Artificial Intelligence

ACSNet: Action-Context Separation Network for Weakly Supervised Temporal Action Localization

Abstract

Metrics

Citation History

Topics

Related Documents

Action Coherence Network for Weakly Supervised Temporal Action Localization

Action Coherence Network for Weakly-Supervised Temporal Action Localization

Completeness Modeling and Context Separation for Weakly Supervised Temporal Action Localization

Action-to-Action Diffusion Network for Weakly Supervised Temporal Action Localization

Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization