JOURNAL ARTICLE

Action Completeness Modeling with Background Aware Networks for Weakly-Supervised Temporal Action Localization

Abstract

The state-of-the-art of fully-supervised methods for temporal action localization from untrimmed videos has achieved impressive results. Yet, it remains unsatisfactory for the weakly-supervised temporal action localization, where only video-level action labels are given without the timestamp annotation on when the actions occur. The main reason comes from that, the weakly-supervised networks only focus on the highly discriminative frames, but there are some ambiguous frames in both background and action classes. The ambiguous frames in background class are very similar to the real actions, which may be treated as target actions and result in false positives. On the other hand, the ambiguous frames in action class which possibly contain action instances, are prone to be false negatives by the weakly-supervised networks and result in a coarse localization. To solve these problems, we introduce a novel weakly-supervised Action Completeness Modeling with Background Aware Networks (ACM-BANets). Our Background Aware Network (BANet) contains a weight-sharing two-branch architecture, with an action guided Background aware Temporal Attention Module (B-TAM) and an asymmetrical training strategy, to suppress both highly discriminative and ambiguous background frames to remove the false positives. Our action completeness modeling contains multiple BANets, and the BANets are forced to discover different but complementary action instances to completely localize the action instances in both highly discriminative and ambiguous action frames. In the i-th iteration, the i-th BANet discovers the discriminative features, which are then erased from the feature map. The partially-erased feature map is fed into the (i+1)-th BANet of the next iteration to force this BANet to discover discriminative features different from the i-th BANet. Evaluated on two challenging untrimmed video datasets, THUMOS14 and ActivityNet1.3, our approach outperforms all the current weakly-supervised methods for temporal action localization.

Keywords:
Discriminative model Artificial intelligence Computer science Timestamp False positive paradox Feature (linguistics) Action (physics) Pattern recognition (psychology) Completeness (order theory) Class (philosophy) Machine learning Mathematics

Metrics

40
Cited By
3.15
FWCI (Field Weighted Citation Impact)
38
Refs
0.93
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Diabetic Foot Ulcer Assessment and Management
Health Sciences →  Medicine →  Endocrinology, Diabetes and Metabolism

Related Documents

JOURNAL ARTICLE

ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization

Bo HeXitong YangLe KangZhiyu ChengXin ZhouAbhinav Shrivastava

Journal:   2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Year: 2022 Pages: 13915-13925
JOURNAL ARTICLE

Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action Localization

Jinah KimJungchan Cho

Journal:   IEEE Access Year: 2022 Vol: 10 Pages: 65315-65325
JOURNAL ARTICLE

Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization

Pilhyeon LeeHyeran Byun

Journal:   arXiv (Cornell University) Year: 2021 Pages: 13648-13657
JOURNAL ARTICLE

Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization

Pilhyeon LeeHyeran Byun

Journal:   2021 IEEE/CVF International Conference on Computer Vision (ICCV) Year: 2021 Pages: 13628-13637
© 2026 ScienceGate Book Chapters — All rights reserved.