JOURNAL ARTICLE

Spatio-temporal relational reasoning for video question answering

Abstract

Video question answering is the task of automatically answering questions about videos. Apart from direct practical interest, it provides a good way to benchmark our progress on various tasks in video understanding. A successful algorithm must ground objects of interest and model relationships among them in both the spatial and temporal domains jointly. We show that the existing state-of-the-art approaches, which are based on Convolutional Neural Networks or Recurrent Neural Networks, are not effective at joint reasoning in both spatial and temporal domains. Moreover, they are short-sighted and struggle with long-range dependencies in videos. To address these challenges, we present a novel spatio-temporal reasoning neural module that models complex multi-entity relationships in space and long-term dependencies in time. Our model captures both time-changing object interactions and action dynamics of individual objects in an effective way. We evaluate our module on two benchmark datasets which require spatio-temporal reasoning: TGIF-QA and SVQA. We achieve state-of-the-art performance on both datasets. More significantly, we achieve substantial improvements in some of the most challenging question types, like counting, which demonstrate the effectiveness of our proposed spatio-temporal relational module.

Keywords:
Computer science Question answering Relational database Artificial intelligence Information retrieval Natural language processing

Metrics

5
Cited By
0.53
FWCI (Field Weighted Citation Impact)
50
Refs
0.71
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.