JOURNAL ARTICLE

Knowledge-Aware Causal Inference Network for Visual Dialog

Abstract

The effective knowledge and interaction within multi-modalities are key to Visual Dialog. Classic graph-based framework with the direct connection between history dialog and answer fails to give the right answer for the spurious guidance and strong bias induced from history dialog. Recent causal inference framework without this direct connection improves the generalization while worse accuracy. In this work, we propose a novel Knowledge-Aware Causal Inference framework(KACI-Net) in which the commonsense knowledge is introduced into the causal inference framework to achieve both high accuracy and generalization. Specifically, the commonsense knowledge is first generated according to the entities extracted from the question and fused with language and visual features with the co-attention to get the final answer. Comparisons with knowledge-unaware framework and graph-based knowledge-aware framework on VisDial v1.0 dataset show the superiority of our proposed framework and verify the effectiveness the usage of the commonsense knowledge for a good reasoning in Visual Dialog. Both high NDCG and MRR metrics indicate a good trade-off between accuracy and generalization.

Keywords:
Computer science Dialog box Inference Commonsense reasoning Artificial intelligence Commonsense knowledge Causal inference Generalization Graph Machine learning Natural language processing Knowledge graph Domain knowledge Theoretical computer science

Metrics

7
Cited By
1.27
FWCI (Field Weighted Citation Impact)
29
Refs
0.76
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Context-Aware Graph Inference With Knowledge Distillation for Visual Dialog

Dan GuoHui WangMeng Wang

Journal:   IEEE Transactions on Pattern Analysis and Machine Intelligence Year: 2021 Vol: 44 (10)Pages: 6056-6073
JOURNAL ARTICLE

Textual-Visual Reference-Aware Attention Network for Visual Dialog

Dan GuoHui WangShuhui WangMeng Wang

Journal:   IEEE Transactions on Image Processing Year: 2020 Vol: 29 Pages: 6655-6666
JOURNAL ARTICLE

Heterogeneous Knowledge Network for Visual Dialog

Lei ZhaoJunlin LiLianli GaoYunbo RaoJingkuan SongHeng Tao Shen

Journal:   IEEE Transactions on Circuits and Systems for Video Technology Year: 2022 Vol: 33 (2)Pages: 861-871
© 2026 ScienceGate Book Chapters — All rights reserved.