JOURNAL ARTICLE

Zero-shot Scene Graph Generation via Triplet Calibration and Reduction

Jiankai LiYunhong WangWeixin Li

Year: 2023 Journal:   ACM Transactions on Multimedia Computing Communications and Applications Vol: 20 (1)Pages: 1-21   Publisher: Association for Computing Machinery

Abstract

Scene Graph Generation (SGG) plays a pivotal role in downstream vision-language tasks. Existing SGG methods typically suffer from poor compositional generalizations on unseen triplets. They are generally trained on incompletely annotated scene graphs that contain dominant triplets and tend to bias toward these seen triplets during inference. To address this issue, we propose a Triplet Calibration and Reduction (T-CAR) framework in this article. In our framework, a triplet calibration loss is first presented to regularize the representations of diverse triplets and to simultaneously excavate the unseen triplets in incompletely annotated training scene graphs. Moreover, the unseen space of scene graphs is usually several times larger than the seen space, since it contains a huge number of unrealistic compositions. Thus, we propose an unseen space reduction loss to shift the attention of excavation to reasonable unseen compositions to facilitate the model training. Finally, we propose a contextual encoder to improve the compositional generalizations of unseen triplets by explicitly modeling the relative spatial relations between subjects and objects. Extensive experiments show that our approach achieves consistent improvements for zero-shot SGG over state-of-the-art methods. The code is available at https://github.com/jkli1998/T-CAR .

Keywords:
Inference Computer science Encoder Scene graph Graph Reduction (mathematics) Artificial intelligence Space (punctuation) Calibration Theoretical computer science Shot (pellet) Computer vision Natural language processing Pattern recognition (psychology) Algorithm Mathematics

Metrics

3
Cited By
0.55
FWCI (Field Weighted Citation Impact)
71
Refs
0.60
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.