JOURNAL ARTICLE

Snippet-to-Prototype Contrastive Consensus Network for Weakly Supervised Temporal Action Localization

Yuxiang ShaoFeifei ZhangChangsheng Xu

Year: 2024 Journal:   IEEE Transactions on Multimedia Vol: 26 Pages: 6717-6729   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Weakly-supervised temporal action localization aims to localize action instances from untrimmed videos with only video-level labels. Due to the lack of frame-wise annotations, most methods embrace a localization-by-classification paradigm. However, the large supervision gap between classification and localization hinders models from obtaining accurate snippet-wise classification sequences and action proposals. We propose a snippet-to-prototype contrastive consensus network (SPCC-Net) to simultaneously generate feature-level and label-level supervision information to narrow the supervision gap between classification and localization. Specifically, the network adopts a two-stream framework incorporating the optical flow and fusion streams to fully leverage the motion and complementary information from multiple modalities. Firstly, the snippet-to-prototype contrast module is executed within each stream to learn prototypes for all categories and contrast them with action snippets to guarantee intra-class compactness and inter-class separability of snippet features. Secondly, for generating accurate label-level supervision information through complementary information of multimodal features, the multi-modality consensus module ensures not only category consistency through knowledge distillation but also semantic consistency through contrastive learning. Finally, we introduce the auxiliary multiple instance learning (MIL) loss to alleviate the issue that existing MIL-based methods only localize sparse discriminative snippets. Extensive experiments are conducted on two public datasets, THUMOS-14 and ActivityNet-1.3, to demonstrate the superior performance of our method over state-of-the-art methods.

Keywords:
Snippet Computer science Discriminative model Artificial intelligence Leverage (statistics) Consistency (knowledge bases) Machine learning Feature extraction Margin (machine learning) Pattern recognition (psychology) Information retrieval

Metrics

7
Cited By
3.71
FWCI (Field Weighted Citation Impact)
77
Refs
0.87
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Vision and Imaging
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Consensus Contrastive Sampling Network for Weakly-Supervised Temporal Action Localization

应诚 陶

Journal:   Computer Science and Application Year: 2024 Vol: 14 (02)Pages: 183-199
JOURNAL ARTICLE

Discriminative Action Snippet Propagation Network for Weakly Supervised Temporal Action Localization

Yuanjie DangHuang Chun-xiaPeng ChenDongdong ZhaoNan GaoRonghua LiangRuohong Huan

Journal:   ACM Transactions on Multimedia Computing Communications and Applications Year: 2024 Vol: 20 (6)Pages: 1-21
JOURNAL ARTICLE

Deep snippet selective network for weakly supervised temporal action localization

Yongxin GeXiaolei QinDan YangMartin Jägersand

Journal:   Pattern Recognition Year: 2020 Vol: 110 Pages: 107686-107686
© 2026 ScienceGate Book Chapters — All rights reserved.