Task-Guided Dynamic Visual Reasoning for Visual Question Answering

Cong Yao; Hongwei Mo

doi:10.1142/s0219843625400122

ScienceGate Book Chapters

JOURNAL ARTICLE

Task-Guided Dynamic Visual Reasoning for Visual Question Answering

Cong Yao Hongwei Mo

Year: 2025 Journal: International Journal of Humanoid Robotics Publisher: World Scientific

DOI: 10.1142/s0219843625400122

Get Full-Text PDF Get Analytical Report

Abstract

Visual reasoning ability, as an advanced cognitive ability of models, has been widely studied. In visual question answering tasks, the symbolic reasoning method based on task decomposition enables the model to perform visual reasoning according to human logical patterns, thereby facilitating question answering. This approach has demonstrated impressive performance across various descriptive question types. The quintessential visual question answering task comprises a question paired with an image. When compared to static visual reasoning centered around images, dynamic visual reasoning pertaining to video content poses heightened challenges in logic, temporal comprehension, and causal structure, rendering it difficult for prior methodologies to grasp the dynamic interrelationships among objects within dynamic scenes. In this study, we propose a task-guided dynamic visual reasoning method for visual question answering, which models the spatiotemporal states of objects in dynamic scenes, decomposes the questions into task steps, and finally deduces reasoning on the established spatiotemporal dynamic scene graph neural network. We performed experimental verification with two benchmarks, CLEVRER and CATER, and the results of this verification show that our model can effectively extract spatiotemporal features of objects in dynamic scenes, perform well in descriptive problems, and improve the accuracy of explanatory and predictive problems compared to comparative models.

Keywords:

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.45

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Constraint Satisfaction and Optimization

Physical Sciences → Computer Science → Computer Networks and Communications

Task-Guided Dynamic Visual Reasoning for Visual Question Answering

Abstract

Metrics

Topics

Related Documents

Visual-Guided Reasoning Path Generation for Visual Question Answering

Comprehensive-perception dynamic reasoning for visual question answering

Sequential Visual Reasoning for Visual Question Answering

Question-guided spatial relation graph reasoning model for visual question answering

Ques-to-Visual Guided Visual Question Answering