JOURNAL ARTICLE

Task-Guided Dynamic Visual Reasoning for Visual Question Answering

Cong YaoHongwei Mo

Year: 2025 Journal:   International Journal of Humanoid Robotics   Publisher: World Scientific

Abstract

Visual reasoning ability, as an advanced cognitive ability of models, has been widely studied. In visual question answering tasks, the symbolic reasoning method based on task decomposition enables the model to perform visual reasoning according to human logical patterns, thereby facilitating question answering. This approach has demonstrated impressive performance across various descriptive question types. The quintessential visual question answering task comprises a question paired with an image. When compared to static visual reasoning centered around images, dynamic visual reasoning pertaining to video content poses heightened challenges in logic, temporal comprehension, and causal structure, rendering it difficult for prior methodologies to grasp the dynamic interrelationships among objects within dynamic scenes. In this study, we propose a task-guided dynamic visual reasoning method for visual question answering, which models the spatiotemporal states of objects in dynamic scenes, decomposes the questions into task steps, and finally deduces reasoning on the established spatiotemporal dynamic scene graph neural network. We performed experimental verification with two benchmarks, CLEVRER and CATER, and the results of this verification show that our model can effectively extract spatiotemporal features of objects in dynamic scenes, perform well in descriptive problems, and improve the accuracy of explanatory and predictive problems compared to comparative models.

Keywords:

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
29
Refs
0.45
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Constraint Satisfaction and Optimization
Physical Sciences →  Computer Science →  Computer Networks and Communications

Related Documents

JOURNAL ARTICLE

Comprehensive-perception dynamic reasoning for visual question answering

Kai ShuangJinyu GuoZihan Wang

Journal:   Pattern Recognition Year: 2022 Vol: 131 Pages: 108878-108878
JOURNAL ARTICLE

Question-guided spatial relation graph reasoning model for visual question answering

Hong LanPufen Zhang

Journal:   Journal of Image and Graphics Year: 2022 Vol: 27 (7)Pages: 2274-2286
JOURNAL ARTICLE

Ques-to-Visual Guided Visual Question Answering

Xiangyu WuJianfeng LuZhuanfeng LiFengchao Xiong

Journal:   2022 IEEE International Conference on Image Processing (ICIP) Year: 2022 Pages: 4193-4197
© 2026 ScienceGate Book Chapters — All rights reserved.