JOURNAL ARTICLE

Reasoning with Heterogeneous Graph Alignment for Video Question Answering

Jiang PinYahong Han

Year: 2020 Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Vol: 34 (07)Pages: 11109-11116   Publisher: Association for the Advancement of Artificial Intelligence

Abstract

The dominant video question answering methods are based on fine-grained representation or model-specific attention mechanism. They usually process video and question separately, then feed the representations of different modalities into following late fusion networks. Although these methods use information of one modality to boost the other, they neglect to integrate correlations of both inter- and intra-modality in an uniform module. We propose a deep heterogeneous graph alignment network over the video shots and question words. Furthermore, we explore the network architecture from four steps: representation, fusion, alignment, and reasoning. Within our network, the inter- and intra-modality information can be aligned and interacted simultaneously over the heterogeneous graph and used for cross-modal reasoning. We evaluate our method on three benchmark datasets and conduct extensive ablation study to the effectiveness of the network architecture. Experiments show the network to be superior in quality.

Keywords:
Computer science Modality (human–computer interaction) Artificial intelligence Benchmark (surveying) Graph Question answering Heterogeneous network Modalities Representation (politics) Network architecture Theoretical computer science Computer network

Metrics

176
Cited By
9.96
FWCI (Field Weighted Citation Impact)
41
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Heterogeneous-Graph Reasoning With Context Paraphrase for Commonsense Question Answering

Yujie WangZhang HuJiye LiangRu Li

Journal:   IEEE/ACM Transactions on Audio Speech and Language Processing Year: 2024 Vol: 32 Pages: 3759-3770
JOURNAL ARTICLE

Graph-based relational reasoning network for video question answering

Tao TanGuanglu Sun

Journal:   Machine Vision and Applications Year: 2024 Vol: 36 (1)
JOURNAL ARTICLE

Cross-modal heterogeneous graph reasoning network for visual question answering

Jing ZhangJ. X. TengWeichao DingZhe Wang

Journal:   Neural Computing and Applications Year: 2025 Vol: 37 (22)Pages: 17701-17721
© 2026 ScienceGate Book Chapters — All rights reserved.