JOURNAL ARTICLE

End-to-end Multi-task Learning Framework for Spatio-Temporal Grounding in Video Corpus

Yingqi GaoZhiling LuoShiqian ChenWei Zhou

Year: 2022 Journal:   Proceedings of the 31st ACM International Conference on Information & Knowledge Management Pages: 3958-3962

Abstract

In this paper, we consider a novel task, Video Corpus Spatio-Temporal Grounding (VCSTG) for material selection and spatio-temporal adaption in intelligent video editing. Given a text query depicting an object and a corpus of untrimmed and unsegmented videos, VCSTG aims to localize a sequence of spatio-temporal object tubes from the video corpus. Existing methods tackle the VCSTG task in a multi-stage approach, which encodes the query and video representation independently for each task, leading to local optimum. In this paper, we propose a novel one-stage multi-task learning based framework named MTSTG for the VCSTG task. MTSTG learns unified query and video representation for video retrieval, temporal grounding and spatial grounding tasks. Video-level, frame-level and object-level contrastive learning are introduced to measure the mutual information between query and video at different granularity. Comprehensive experiments demonstrate our newly proposed framework outperforms the state-of-the-art multi-stage methods on VidSTG dataset.

Keywords:
Computer science End-to-end principle Task (project management) Ground Artificial intelligence Human–computer interaction Engineering Systems engineering Electrical engineering

Metrics

2
Cited By
0.14
FWCI (Field Weighted Citation Impact)
21
Refs
0.42
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.