Action Completeness Modeling with Background Aware Networks for Weakly-Supervised Temporal Action Localization

Md Moniruzzaman; Zhaozheng Yin; Zhihai He; Ruwen Qin; Ming C. Leu

doi:10.1145/3394171.3413687

ScienceGate Book Chapters

JOURNAL ARTICLE

Action Completeness Modeling with Background Aware Networks for Weakly-Supervised Temporal Action Localization

Md Moniruzzaman Zhaozheng Yin Zhihai He Ruwen Qin Ming C. Leu

Year: 2020 Pages: 2166-2174

DOI: 10.1145/3394171.3413687

Get Full-Text PDF Get Analytical Report

Abstract

The state-of-the-art of fully-supervised methods for temporal action localization from untrimmed videos has achieved impressive results. Yet, it remains unsatisfactory for the weakly-supervised temporal action localization, where only video-level action labels are given without the timestamp annotation on when the actions occur. The main reason comes from that, the weakly-supervised networks only focus on the highly discriminative frames, but there are some ambiguous frames in both background and action classes. The ambiguous frames in background class are very similar to the real actions, which may be treated as target actions and result in false positives. On the other hand, the ambiguous frames in action class which possibly contain action instances, are prone to be false negatives by the weakly-supervised networks and result in a coarse localization. To solve these problems, we introduce a novel weakly-supervised Action Completeness Modeling with Background Aware Networks (ACM-BANets). Our Background Aware Network (BANet) contains a weight-sharing two-branch architecture, with an action guided Background aware Temporal Attention Module (B-TAM) and an asymmetrical training strategy, to suppress both highly discriminative and ambiguous background frames to remove the false positives. Our action completeness modeling contains multiple BANets, and the BANets are forced to discover different but complementary action instances to completely localize the action instances in both highly discriminative and ambiguous action frames. In the i-th iteration, the i-th BANet discovers the discriminative features, which are then erased from the feature map. The partially-erased feature map is fed into the (i+1)-th BANet of the next iteration to force this BANet to discover discriminative features different from the i-th BANet. Evaluated on two challenging untrimmed video datasets, THUMOS14 and ActivityNet1.3, our approach outperforms all the current weakly-supervised methods for temporal action localization.

Keywords:

Discriminative model Artificial intelligence Computer science Timestamp False positive paradox Feature (linguistics) Action (physics) Pattern recognition (psychology) Completeness (order theory) Class (philosophy) Machine learning Mathematics

Metrics

Cited By

3.15

FWCI (Field Weighted Citation Impact)

Refs

0.93

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Diabetic Foot Ulcer Assessment and Management

Health Sciences → Medicine → Endocrinology, Diabetes and Metabolism

Action Completeness Modeling with Background Aware Networks for Weakly-Supervised Temporal Action Localization

Abstract

Metrics

Citation History

Topics

Related Documents

ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization

Completeness Modeling and Context Separation for Weakly Supervised Temporal Action Localization

Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action Localization

Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization

Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization