Abstract

An image usually contains not only visual information but also higher-level semantic information. Nevertheless, previous computer vision algorithms, such as target detection and image classification, use only the visual features of the image alone. Recently, the explosion of scene graphs in computer vision has led to the challenge of generating structured scene graphs with rich semantic information. This paper proposes a one-stage query-based end-to-end Transformer model and generates scene graphs using the Hungarian matching algorithm. We develop an anti-bias reasoner module to reduce the impact of the unbalanced data distribution. Time-division training strategy is proposed to improve model training efficiency and speed up model convergence while improving model training performance. Experiments on the large-scale dataset Visual Genome were conducted in order to confirm the validity of our method. Compared with the existing state-of-the-art method, our method guarantees inference speed while maintaining acceptable performance and is more suitable for tasks with high real-time performance. Our work demonstrates that the one-stage method has great potential for exploration in scene graph generation.

Keywords:
Computer science Transformer Inference Semantic reasoner End-to-end principle Artificial intelligence Scene graph Graph Information leakage Computer vision Machine learning Data mining Theoretical computer science Voltage

Metrics

2
Cited By
0.36
FWCI (Field Weighted Citation Impact)
29
Refs
0.53
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

SGTR: End-to-end Scene Graph Generation with Transformer

Rongjie LiSongyang ZhangXuming He

Journal:   2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Year: 2022 Pages: 19464-19474
JOURNAL ARTICLE

SGTR+: End-to-End Scene Graph Generation With Transformer

Rongjie LiSongyang ZhangXuming He

Journal:   IEEE Transactions on Pattern Analysis and Machine Intelligence Year: 2023 Vol: 46 (4)Pages: 2191-2205
JOURNAL ARTICLE

End-to-End Video Scene Graph Generation With Temporal Propagation Transformer

Yong ZhangYingwei PanTing YaoRui HuangTao MeiChang Wen Chen

Journal:   IEEE Transactions on Multimedia Year: 2023 Vol: 26 Pages: 1613-1625
JOURNAL ARTICLE

Pair with prior queries for end-to-end scene graph generation

Songqing CaiXiaojun ChangShengsheng Ren

Journal:   IET conference proceedings. Year: 2024 Vol: 2023 (38)Pages: 100-105
© 2026 ScienceGate Book Chapters — All rights reserved.