Multi-Hierarchical Category Supervision for Weakly-Supervised Temporal Action Localization

Guozhang Li; Jie Li; Nannan Wang; Xinpeng Ding; Zhifeng Li; Xinbo Gao

doi:10.1109/tip.2021.3124671

ScienceGate Book Chapters

JOURNAL ARTICLE

Multi-Hierarchical Category Supervision for Weakly-Supervised Temporal Action Localization

Guozhang Li Jie Li Nannan Wang Xinpeng Ding Zhifeng Li Xinbo Gao

Year: 2021 Journal: IEEE Transactions on Image Processing Vol: 30 Pages: 9332-9344 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/tip.2021.3124671

Get Full-Text PDF Get Analytical Report

Abstract

Weakly Supervised Temporal Action Localization (WTAL) aims to localize action segments in untrimmed videos with only video-level category labels in the training phase. In WTAL, an action generally consists of a series of sub-actions, and different categories of actions may share the common sub-actions. However, to distinguish different categories of actions with only video-level class labels, current WTAL models tend to focus on discriminative sub-actions of the action, while ignoring those common sub-actions shared with different categories of actions. This negligence of common sub-actions would lead to the located action segments incomplete, i.e., only containing discriminative sub-actions. Different from current approaches of designing complex network architectures to explore more complete actions, in this paper, we introduce a novel supervision method named multi-hierarchical category supervision (MHCS) to find more sub-actions rather than only the discriminative ones. Specifically, action categories sharing similar sub-actions will be constructed as super-classes through hierarchical clustering. Hence, training with the new generated super-classes would encourage the model to pay more attention to the common sub-actions, which are ignored training with the original classes. Furthermore, our proposed MHCS is model-agnostic and non-intrusive, which can be directly applied to existing methods without changing their structures. Through extensive experiments, we verify that our supervision method can improve the performance of four state-of-the-art WTAL methods on three public datasets: THUMOS14, ActivityNet1.2, and ActivityNet1.3.

Keywords:

Discriminative model Computer science Artificial intelligence Action (physics) Class (philosophy) Machine learning Hierarchical clustering Cluster analysis Pattern recognition (psychology) Natural language processing

Metrics

Cited By

1.12

FWCI (Field Weighted Citation Impact)

Refs

0.80

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Anomaly Detection Techniques and Applications

Physical Sciences → Computer Science → Artificial Intelligence

Multi-Hierarchical Category Supervision for Weakly-Supervised Temporal Action Localization

Abstract

Metrics

Citation History

Topics

Related Documents

Adaptive Mutual Supervision for Weakly-Supervised Temporal Action Localization

PivoTAL: Prior-Driven Supervision for Weakly-Supervised Temporal Action Localization

Weakly-Supervised Temporal Action Localization with Multi-Modal Plateau Transformers

Weakly-supervised temporal action localization using multi-branch attention weighting

Weakly Supervised Temporal Action Localization by Multi-Stage Fusion Network