JOURNAL ARTICLE

Improving Predicate Representation in Scene Graph Generation by Self-Supervised Learning

So HasegawaMasayuki HiromotoAkira NakagawaYuhei Umeda

Year: 2023 Journal:   2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Vol: 33 Pages: 2739-2748

Abstract

Scene graph generation (SGG) aims to understand sophisticated visual information by detecting triplets of subject, object, and their relationship (predicate). Since the predicate labels are heavily imbalanced, existing supervised methods struggle to improve accuracy for the rare predicates due to insufficient labeled data. In this paper, we propose SePiR, a novel self-supervised learning method for SGG to improve the representation of rare predicates. We first train a relational encoder by contrastive learning without using predicate labels, and then fine-tune a predicate classifier with labeled data. To apply contrastive learning to SGG, we newly propose data augmentation in which subject-object pairs are augmented by replacing their visual features with those from other images having the same object labels. By such augmentation, we can increase the variation of the visual features while keeping the relationship between the objects. Comprehensive experimental results on the Visual Genome dataset show that the SGG performance of SePiR is comparable to the state-of-theart, and especially with the limited labeled dataset, our method significantly outperforms the existing supervised methods. Moreover, SePiR's improved representation enables the model architecture simpler, resulting in 3.6x and 6.3x reduction of the parameters and inference time from the existing method, independently.

Keywords:
Computer science Artificial intelligence Inference Classifier (UML) Predicate (mathematical logic) Graph Pattern recognition (psychology) Encoder Natural language processing Feature learning Machine learning Theoretical computer science

Metrics

2
Cited By
0.16
FWCI (Field Weighted Citation Impact)
56
Refs
0.29
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.