Structural and Contrastive Guidance Mining for Weakly-Supervised Language Moment Localization

Dongjie Tang; Xiao‐Jie Cao

doi:10.1109/access.2024.3450878

ScienceGate Book Chapters

JOURNAL ARTICLE

Structural and Contrastive Guidance Mining for Weakly-Supervised Language Moment Localization

Dongjie Tang Xiao‐Jie Cao

Year: 2024 Journal: IEEE Access Vol: 12 Pages: 129290-129301 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/access.2024.3450878

Get Full-Text PDF Get Analytical Report

Abstract

Weakly supervised temporal video grounding focuses on localizing the temporal moment or segment corresponding to a sentence query in an untrimmed, long video with only video-level annotations. However, due to the lack of ground moment annotation, current methods suffer from several issues, such as the uncertainty of event starting/ending points and incomplete semantic matching with the sentence. Based on these challenges, we innovate our model. To reduce learning uncertainty and localize the moment more accurately, we calculate the matching score curve between each video frame and the sentence query. Using this matching score curve, we create pseudo ground truth to supervise the localization network. To achieve complete semantic matching with the sentence semantics, we propose a semantic prediction module based on matched video-sentence pairs and a semantic contrastive training strategy for unmatched pairs. Lastly, to improve model accuracy, we construct several contrastive samples that contain similar but different semantics in the semantic contrastive training strategy. This helps in learning different semantics and achieving complete semantic matching. We conduct extensive experiments on the Charades-STA, ActivityNet Captions, and DiDeMo datasets. The results demonstrate that our proposed method significantly outperforms the state-of-the-art by more than 10% in terms of mean Intersection over Union (mIoU) when ranging from 0.6 to 0.8, and by more than 30% when IoU equals 0.7. The code is publicly available at https://github.com/anonymousabca/WLML.

Keywords:

Computer science Artificial intelligence Natural language processing Moment (physics) Physics

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.12

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Speech and dialogue systems

Physical Sciences → Computer Science → Artificial Intelligence

Video Analysis and Summarization

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Structural and Contrastive Guidance Mining for Weakly-Supervised Language Moment Localization

Abstract

Metrics

Topics

Related Documents

Weakly Supervised Video Moment Localization with Contrastive Negative Sample Mining

Foreground Mining via Contrastive Guidance for Weakly Supervised Object Localization

Weakly-Supervised Video Moment Localization

Contrastive Learning based on Counterfactual Sequences for Weakly Supervised Video Moment Localization

Weakly supervised moment localization with natural language based on semantic reconstruction