Spatio-temporal relational reasoning for video question answering

Gursimran Singh

doi:10.14288/1.0384578

ScienceGate Book Chapters

JOURNAL ARTICLE

Spatio-temporal relational reasoning for video question answering

Gursimran Singh

Year: 2019 Journal: Open Collections

DOI: 10.14288/1.0384578

Get Full-Text PDF Get Analytical Report

Abstract

Video question answering is the task of automatically answering questions about videos. Apart from direct practical interest, it provides a good way to benchmark our progress on various tasks in video understanding. A successful algorithm must ground objects of interest and model relationships among them in both the spatial and temporal domains jointly. We show that the existing state-of-the-art approaches, which are based on Convolutional Neural Networks or Recurrent Neural Networks, are not effective at joint reasoning in both spatial and temporal domains. Moreover, they are short-sighted and struggle with long-range dependencies in videos. To address these challenges, we present a novel spatio-temporal reasoning neural module that models complex multi-entity relationships in space and long-term dependencies in time. Our model captures both time-changing object interactions and action dynamics of individual objects in an effective way. We evaluate our module on two benchmark datasets which require spatio-temporal reasoning: TGIF-QA and SVQA. We achieve state-of-the-art performance on both datasets. More significantly, we achieve substantial improvements in some of the most challenging question types, like counting, which demonstrate the effectiveness of our proposed spatio-temporal relational module.

Keywords:

Computer science Question answering Relational database Artificial intelligence Information retrieval Natural language processing

Metrics

Cited By

0.53

FWCI (Field Weighted Citation Impact)

Refs

0.71

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Video Analysis and Summarization

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Spatio-temporal relational reasoning for video question answering

Abstract

Metrics

Citation History

Topics

Related Documents

Video Question Answering with Spatio-Temporal Reasoning

Spatio-Temporal Context Networks for Video Question Answering

Discovering Spatio-Temporal Rationales for Video Question Answering

Graph-based relational reasoning network for video question answering

Video Question Answering via Hierarchical Spatio-Temporal Attention Networks