JOURNAL ARTICLE

Towards Confidence-Aware Commonsense Knowledge Integration for Scene Graph Generation

Abstract

Commonsense knowledge has been widely explored to improve Scene Graph Generation (SGG). Existing methods simply incorporate the described relations of knowledge bases into each part of the scene for a concrete understanding. However, they ignore the discussion about whether a visual scene needs to associate commonsense knowledge for making inferences. Specifically, the difficulty of relation recognition varies from its type. Some frequent spatial relations (e.g. on) usually produce less perception error even without any prior information, while others involved many rules and patterns (e.g. throwing) possess few samples and require to combine with some commonsense knowledge as supplementary. In this paper, we propose a novel confidence-aware commonsense knowledge integration for SGG. Firstly, we depend on mutual information maximization to design a hybrid-attention module, which decreases the uncertainty in representation learning given external knowledge. Second, we introduce an extra branch for SGG network to perform confidence estimation independent of any ground truth labels, in which the output scalar explicitly reflects the difficulty of visual recognition. This value is equipped with the ability to balance the demand for commonsense knowledge in a given scene. Experiments are conducted with the backbone of MOTIFS on Visual Genome (VG) and our method effectively promotes the metric of mRecall with little performance hit for metric Recall, especially for predicting unseen relations.

Keywords:
Commonsense knowledge Commonsense reasoning Computer science Artificial intelligence Ground truth Natural language processing Machine learning Domain knowledge

Metrics

3
Cited By
0.55
FWCI (Field Weighted Citation Impact)
32
Refs
0.61
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.