Explore Multi-Step Reasoning in Video Question Answering

Yahong Han

doi:10.1145/3265987.3265996

ScienceGate Book Chapters

JOURNAL ARTICLE

Explore Multi-Step Reasoning in Video Question Answering

Yahong Han

Year: 2018 Pages: 5-5

DOI: 10.1145/3265987.3265996

Get Full-Text PDF Get Analytical Report

Abstract

This invited talk is a repeated but more detailed talk about the paper which is accepted by ACM-MM 2018: Video question answering (VideoQA) always involves visual reasoning. When answering questions composing of multiple logic correlations, models need to perform multi-step reasoning. In this paper, we formulate multi-step reasoning in VideoQA as a new task to answer compositional and logical structured questions based on video content. Existing VideoQA datasets are inadequate as benchmarks for the multi-step reasoning due to limitations as lacking logical structure and having language biases. Thus we design a system to automatically generate a large-scale dataset, namely SVQA (Synthetic Video Question Answering). Compared with other VideoQA datasets, SVQA contains exclusively long and structured questions with various spatial and temporal relations between objects. More importantly, questions in SVQA can be decomposed into human readable logical tree or chain layouts, each node of which represents a sub-task requiring a reasoning operation such as comparison or arithmetic. Towards automatic question answering in SVQA, we develop a new VideoQA model. Particularly, we construct a new attention module, which contains spatial attention mechanism to address crucial and multiple logical sub-tasks embedded in questions, as well as a refined GRU called ta-GRU (temporal-attention GRU) to capture the long-term temporal dependency and gather complete visual cues. Experimental results show the capability of multi-step reasoning of SVQA and the effectiveness of our model when compared with other existing models.

Keywords:

Question answering Computer science Construct (python library) Task (project management) Spatial intelligence Artificial intelligence Dependency (UML) Reasoning system Tree (set theory) Logical reasoning Natural language processing Programming language

Metrics

Cited By

0.58

FWCI (Field Weighted Citation Impact)

Refs

0.68

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Explore Multi-Step Reasoning in Video Question Answering

Abstract

Metrics

Citation History

Topics

Related Documents

Explore Multi-Step Reasoning in Video Question Answering

Efficient multi-step reasoning attention network for visual question answering

Multi-Semantic Alignment Co-Reasoning Network for Video Question Answering

Open-Ended Multi-Modal Relational Reasoning for Video Question Answering

Differentiated Attention with Multi-modal Reasoning for Video Question Answering