JOURNAL ARTICLE

Ensemble Prototype Network For Weakly Supervised Temporal Action Localization

Kewei WuWenjie LuoZhao XieDan GuoZhao ZhangRichang Hong

Year: 2024 Journal:   IEEE Transactions on Neural Networks and Learning Systems Vol: 36 (3)Pages: 4560-4574   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Weakly supervised temporal action localization (TAL) aims to localize the action instances in untrimmed videos using only video-level action labels. Without snippet-level labels, this task should be hard to distinguish all snippets with accurate action/background categories. The main difficulties are the large variations brought by the unconstraint background snippets and multiple subactions in action snippets. The existing prototype model focuses on describing snippets by covering them with clusters (defined as prototypes). In this work, we argue that the clustered prototype covering snippets with simple variations still suffers from the misclassification of the snippets with large variations. We propose an ensemble prototype network (EPNet), which ensembles prototypes learned with consensus-aware clustering. The network stacks a consensus prototype learning (CPL) module and an ensemble snippet weight learning (ESWL) module as one stage and extends one stage to multiple stages in an ensemble learning way. The CPL module learns the consensus matrix by estimating the similarity of clustering labels between two successive clustering generations. The consensus matrix optimizes the clustering to learn consensus prototypes, which can predict the snippets with consensus labels. The ESWL module estimates the weights of the misclassified snippets using the snippet-level loss. The weights update the posterior probabilities of the snippets in the clustering to learn prototypes in the next stage. We use multiple stages to learn multiple prototypes, which can cover the snippets with large variations for accurate snippet classification. Extensive experiments show that our method achieves the state-of-the-art weakly supervised TAL methods on two benchmark datasets, that is, THUMOS'14, ActivityNet v1.2, and ActivityNet v1.3 datasets.

Keywords:
Snippet Computer science Benchmark (surveying) Artificial intelligence Similarity (geometry) Cluster analysis Ensemble learning Machine learning Matching (statistics) Pattern recognition (psychology) Information retrieval Mathematics

Metrics

9
Cited By
4.77
FWCI (Field Weighted Citation Impact)
53
Refs
0.91
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Anomaly Detection Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Adaptive Prototype Learning for Weakly-Supervised Temporal Action Localization

Wang LuoHuan RenTianzhu ZhangWenfei YangYongdong Zhang

Journal:   IEEE Transactions on Image Processing Year: 2024 Vol: 34 Pages: 3154-3168
JOURNAL ARTICLE

Snippet-to-Prototype Contrastive Consensus Network for Weakly Supervised Temporal Action Localization

Yuxiang ShaoFeifei ZhangChangsheng Xu

Journal:   IEEE Transactions on Multimedia Year: 2024 Vol: 26 Pages: 6717-6729
JOURNAL ARTICLE

Action Coherence Network for Weakly-Supervised Temporal Action Localization

Yuanhao ZhaiLe WangWei TangQilin ZhangNanning ZhengGang Hua

Journal:   IEEE Transactions on Multimedia Year: 2021 Vol: 24 Pages: 1857-1870
JOURNAL ARTICLE

Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization

Linjiang HuangLiang WangHongsheng Li

Journal:   2021 IEEE/CVF International Conference on Computer Vision (ICCV) Year: 2021 Pages: 7982-7991
© 2026 ScienceGate Book Chapters — All rights reserved.