JOURNAL ARTICLE

Graph Prompts: Adapting Video Graph for Video Question Answering

Abstract

Due to the dynamic nature in videos, it is evident that perceiving and reasoning about temporal information are the key focus of Video Question Answering (VideoQA). In recent years, several methods have explored relationship-level temporal modeling with graph-structured video representation. Unfortunately, these methods heavily rely on the question text, thus making it challenging to perceive and reason about video content that is not explicitly mentioned in the question. To address the above challenge, we propose Graph Prompts-based VideoQA (GP-VQA), which adopts a video-based graph structure for enhanced video understanding. The proposed GP-VQA contains two stages, i.e., pre-training and prompt tuning. In pre-training, we define the pretext task that requires GP-VQA to reason about the randomly masked nodes or edges in the video graph, thus prompting GP-VQA to learn the reasoning ability with video-guided information. In prompt-tuning, we organize the textual question into question graph and implement message passing from video graph to question graph, therefore inheriting the video-based reasoning ability from video graph completion to VideoQA. Extensive experiments on various datasets have demonstrated the promising performance of GP-VQA.

Keywords:
Computer science Representation (politics) Object (grammar) Information retrieval Artificial intelligence Computer vision

Metrics

2
Cited By
1.06
FWCI (Field Weighted Citation Impact)
0
Refs
0.67
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Robotics and Sensor-Based Localization
Physical Sciences →  Engineering →  Aerospace Engineering
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

BOOK-CHAPTER

Video Graph Transformer for Video Question Answering

Junbin XiaoPan ZhouTat‐Seng ChuaShuicheng Yan

Lecture notes in computer science Year: 2022 Pages: 39-58
JOURNAL ARTICLE

Contrastive Video Question Answering via Video Graph Transformer

Junbin XiaoPan ZhouAngela YaoYicong LiRichang HongShuicheng YanTat‐Seng Chua

Journal:   IEEE Transactions on Pattern Analysis and Machine Intelligence Year: 2023 Vol: 45 (11)Pages: 13265-13280
JOURNAL ARTICLE

Location-Aware Graph Convolutional Networks for Video Question Answering

Deng HuangPeihao ChenRunhao ZengQing DuMingkui TanChuang Gan

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2020 Vol: 34 (07)Pages: 11021-11028
© 2026 ScienceGate Book Chapters — All rights reserved.