Xuhan ZhuRuiping WangXiangyuan LanYaowei Wang
Fine-grained scene graph generation aims to parse the objects and their fine-grained relationships within scenes. Despite the significant progress in recent years, their performance is still limited by two major issues: (1) ambiguous perception under a global view; (2) the lack of reliable, fine-grained annotations. We argue that understanding the local context is important in addressing the two issues. However, previous works often overlook it, which limits their effectiveness in fine-grained scene graph generation. To tackle this challenge, we introduce a Local-context Attention Learning method that concentrates on local context and can generate high-reliability, fine-grained annotations. It comprises two components: (1) The Fine-grained Location Attention Network (FLAN), a multi-branch network that encompasses global and local branches, can attend to local informative context and perceive granularity levels in different regions, thereby adaptively enhancing the learning of fine-grained locations. (2) The Fine-grained Location Label Transfer (FLLT) method identifies coarse-grained labels inconsistent with the local context and determines which labels should be transferred through the global confidence thresholding strategy, finally transferring them to reliable local context-consistent fine-grained ones. Experiments conducted on the Visual Genome, OpenImage, and GQA-200 datasets show that the proposed methods achieve significant improvements on the fine-grained scene graph generation task. By addressing the challenge mentioned above, our method also achieves state-of-the-art performances on the three datasets.
Xinyu LyuLianli GaoYuyu GuoZhou ZhaoHao HuangHeng Tao ShenJingkuan Song
Youming DengYansheng LiYongjun ZhangXiang XiangJian WangJingdong ChenJiayi Ma
Xinyu LyuLianli GaoPengpeng ZengHeng Tao ShenJingkuan Song
Ao ZhangYuan YaoQianyu ChenWei JiZhiyuan LiuMaosong SunTat‐Seng Chua