JOURNAL ARTICLE

IS-GGT: Iterative Scene Graph Generation with Generative Transformers

Abstract

Scene graphs provide a rich, structured representation of a scene by encoding the entities (objects) and their spatial relationships in a graphical format. This representation has proven useful in several tasks, such as question answering, captioning, and even object detection, to name a few. Current approaches take a generation-by-classification approach where the scene graph is generated through labeling of all possible edges between objects in a scene, which adds computational overhead to the approach. This work introduces a generative transformer-based approach to generating scene graphs beyond link prediction. Using two transformer-based components, we first sample a possible scene graph structure from detected objects and their visual features. We then perform predicate classification on the sampled edges to generate the final scene graph. This approach allows us to efficiently generate scene graphs from images with minimal inference overhead. Extensive experiments on the Visual Genome dataset demonstrate the efficiency of the proposed approach. Without bells and whistles, we obtain, on average, 20.7% mean recall (mR@100) across different settings for scene graph generation (SGG), outperforming state-of-the-art SGG approaches while offering competitive performance to unbiased SGG approaches.

Keywords:
Computer science Scene graph Inference Transformer Artificial intelligence Generative grammar Generative model Graph Pairwise comparison Pattern recognition (psychology) Computer vision Theoretical computer science Rendering (computer graphics)

Metrics

20
Cited By
3.64
FWCI (Field Weighted Citation Impact)
57
Refs
0.92
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Context-aware Scene Graph Generation with Seq2Seq Transformers

Yichao LuHimanshu RaiJason S. ChangB. A. KnyazevGuangwei YuShashank ShekharGraham W. TaylorMaksims Volkovs

Journal:   2021 IEEE/CVF International Conference on Computer Vision (ICCV) Year: 2021 Pages: 15911-15921
JOURNAL ARTICLE

Composite Relationship Fields with Transformers for Scene Graph Generation

George AdaimiDavid MizrahiAlexandre Alahi

Journal:   2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Year: 2023 Pages: 52-64
JOURNAL ARTICLE

Deep Generative Probabilistic Graph Neural Networks for Scene Graph Generation

Mahmoud KhademiOliver Schulte

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2020 Vol: 34 (07)Pages: 11237-11245
JOURNAL ARTICLE

SceneFormer: Indoor Scene Generation with Transformers

Xinpeng WangChandan YeshwanthMatthias Niesner

Journal:   2021 International Conference on 3D Vision (3DV) Year: 2021 Pages: 106-115
© 2026 ScienceGate Book Chapters — All rights reserved.