Local Correspondence Network for Weakly Supervised Temporal Sentence Grounding

Wenfei Yang; Tianzhu Zhang; Yongdong Zhang; Feng Wu

doi:10.1109/tip.2021.3058614

ScienceGate Book Chapters

JOURNAL ARTICLE

Local Correspondence Network for Weakly Supervised Temporal Sentence Grounding

Wenfei Yang Tianzhu Zhang Yongdong Zhang Feng Wu

Year: 2021 Journal: IEEE Transactions on Image Processing Vol: 30 Pages: 3252-3262 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/tip.2021.3058614

Get Full-Text PDF Get Analytical Report

Abstract

Weakly supervised temporal sentence grounding has better scalability and practicability than fully supervised methods in real-world application scenarios. However, most of existing methods cannot model the fine-grained video-text local correspondences well and do not have effective supervision information for correspondence learning, thus yielding unsatisfying performance. To address the above issues, we propose an end-to-end Local Correspondence Network (LCNet) for weakly supervised temporal sentence grounding. The proposed LCNet enjoys several merits. First, we represent video and text features in a hierarchical manner to model the fine-grained video-text correspondences. Second, we design a self-supervised cycle-consistent loss as a learning guidance for video and text matching. To the best of our knowledge, this is the first work to fully explore the fine-grained correspondences between video and text for temporal sentence grounding by using self-supervised learning. Extensive experimental results on two benchmark datasets demonstrate that the proposed LCNet significantly outperforms existing weakly supervised methods.

Keywords:

Computer science Artificial intelligence Sentence Benchmark (surveying) Scalability Supervised learning Matching (statistics) Machine learning Pattern recognition (psychology) Natural language processing Artificial neural network Mathematics

Metrics

Cited By

6.75

FWCI (Field Weighted Citation Impact)

Refs

0.98

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Video Analysis and Summarization

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Local Correspondence Network for Weakly Supervised Temporal Sentence Grounding

Abstract

Metrics

Citation History

Topics

Related Documents

Contrastive Perturbation Network for Weakly Supervised Temporal Sentence Grounding

Dual Semantic Reconstruction Network for Weakly Supervised Temporal Sentence Grounding

Denoised Dual-Level Contrastive Network for Weakly-Supervised Temporal Sentence Grounding

Weakly Supervised Temporal Sentence Grounding via Positive Sample Mining

Concept-Centric Learning for Weakly-Supervised Temporal Sentence Grounding