Local context attention learning for fine-grained scene graph generation

Xuhan Zhu; Ruiping Wang; Xiangyuan Lan; Yaowei Wang

doi:10.1016/j.patcog.2024.110708

ScienceGate Book Chapters

JOURNAL ARTICLE

Local context attention learning for fine-grained scene graph generation

Xuhan Zhu Ruiping Wang Xiangyuan Lan Yaowei Wang

Year: 2024 Journal: Pattern Recognition Vol: 156 Pages: 110708-110708 Publisher: Elsevier BV

DOI: 10.1016/j.patcog.2024.110708

Get Full-Text PDF Get Analytical Report

Abstract

Fine-grained scene graph generation aims to parse the objects and their fine-grained relationships within scenes. Despite the significant progress in recent years, their performance is still limited by two major issues: (1) ambiguous perception under a global view; (2) the lack of reliable, fine-grained annotations. We argue that understanding the local context is important in addressing the two issues. However, previous works often overlook it, which limits their effectiveness in fine-grained scene graph generation. To tackle this challenge, we introduce a Local-context Attention Learning method that concentrates on local context and can generate high-reliability, fine-grained annotations. It comprises two components: (1) The Fine-grained Location Attention Network (FLAN), a multi-branch network that encompasses global and local branches, can attend to local informative context and perceive granularity levels in different regions, thereby adaptively enhancing the learning of fine-grained locations. (2) The Fine-grained Location Label Transfer (FLLT) method identifies coarse-grained labels inconsistent with the local context and determines which labels should be transferred through the global confidence thresholding strategy, finally transferring them to reliable local context-consistent fine-grained ones. Experiments conducted on the Visual Genome, OpenImage, and GQA-200 datasets show that the proposed methods achieve significant improvements on the fine-grained scene graph generation task. By addressing the challenge mentioned above, our method also achieves state-of-the-art performances on the three datasets.

Keywords:

Computer science Artificial intelligence Context (archaeology) Granularity Graph Scene graph Machine learning Rendering (computer graphics) Theoretical computer science

Metrics

Cited By

3.18

FWCI (Field Weighted Citation Impact)

Refs

0.86

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Local context attention learning for fine-grained scene graph generation

Abstract

Metrics

Citation History

Topics

Related Documents

Fine-Grained Predicates Learning for Scene Graph Generation

Hierarchical Memory Learning for Fine-Grained Scene Graph Generation

Adaptive Fine-Grained Predicates Learning for Scene Graph Generation

Environment-Invariant Curriculum Relation Learning for Fine-Grained Scene Graph Generation

Fine-Grained Scene Graph Generation with Data Transfer