JOURNAL ARTICLE

Prior Knowledge-driven Dynamic Scene Graph Generation with Causal Inference

Abstract

The task of dynamic scene graph generation (DSGG) aims at constructing a set of frame-level scene graphs for the given video. It suffers from two kinds of spurious correlation problems. First, the spurious correlation between input object pair and predicate label is caused by the biased predicate sample distribution in dataset. Second, the spurious correlation between contextual information and predicate label arises from interference caused by background content in both the current frame and adjacent frames of the video sequence. To alleviate spurious correlations, our work is formulated into two sub-tasks: video-specific commonsense graph generation (VsCG) and causal inference (CI). VsCG module aims to alleviate the first correlation by integrating prior knowledge into prediction. Information of all the frames in current video is used to enhance the commonsense graph constructed from co-occurrence patterns of all training samples. Thus, the commonsense graph has been augmented with video-specific temporal dependencies. Then, a CI strategy with both intervention and counterfactual is used. The intervention component further eliminates the first correlation by forcing the model to consider all possible predicate categories fairly, while the counterfactual component resolves the second correlation by removing the bad effect from context. Comprehensive experiments on the Action Genome dataset show that the proposed method achieves state-of-the-art performance.

Keywords:
Spurious relationship Computer science Counterfactual thinking Predicate (mathematical logic) Inference Correlation Graph Artificial intelligence Theoretical computer science Pattern recognition (psychology) Machine learning Mathematics

Metrics

5
Cited By
0.91
FWCI (Field Weighted Citation Impact)
36
Refs
0.71
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Dynamic Scene Graph Generation via Temporal Prior Inference

Shuang WangLianli GaoXinyu LyuYuyu GuoPengpeng ZengJingkuan Song

Journal:   Proceedings of the 30th ACM International Conference on Multimedia Year: 2022 Pages: 5793-5801
BOOK-CHAPTER

Enriching Scene-Graph Generation with Prior Knowledge from Work Instruction

Zoltán JeskóTuan-anh TranGergely HalászJános AbonyiTamás Ruppert

IFIP advances in information and communication technology Year: 2024 Pages: 290-302
JOURNAL ARTICLE

A Novel Framework for Scene Graph Generation via Prior Knowledge

Zhenghao WangJing LianLinhui LiJian Zhao

Journal:   IEEE Transactions on Circuits and Systems for Video Technology Year: 2023 Vol: 34 (5)Pages: 3768-3781
JOURNAL ARTICLE

Multimodal graph inference network for scene graph generation

Jingwen DuanWeidong MinDeyu LinJianfeng XuXin Xiong

Journal:   Applied Intelligence Year: 2021 Vol: 51 (12)Pages: 8768-8783
JOURNAL ARTICLE

Zero-Shot Scene Graph Generation with Knowledge Graph Completion

Xiang YuRuoxin ChenJie LiJiawei SunShijing YuanHuxiao JiXinyu LuChentao Wu

Journal:   2022 IEEE International Conference on Multimedia and Expo (ICME) Year: 2022 Pages: 1-6
© 2026 ScienceGate Book Chapters — All rights reserved.