Xiang YuRuoxin ChenJie LiJiawei SunShijing YuanHuxiao JiXinyu LuChentao Wu
Limited by the incomprehensive training samples, existing scene graph generation (SGG) methods perform poorly on predicting zero-shot (i.e., unseen) subject-predicate-object triples. To address this problem, we propose a general SGG framework to improve their zero-shot performance. The main idea of our method is to generate the information of zero-shot triples before the training of the predicate classifier and thus make the original zero-shot triples non-zero-shot. Specifically, the missing information of zero-shot triples is generated by our proposed knowledge graph completion strategy and then integrated with visual features of images. Therefore, the predicate classification of zero-shot triples is no longer just regarded as a single visual classification task but also transformed into a prediction task of missing links in a knowledge graph. The experiments on the dataset Visual Genome demonstrate that our proposed method outperforms the state-of-the-art methods in popular zero-shot metrics (i.e., zR@N, ng-zR@N) for all popular SGG tasks.
Xiang YuJie LiShijing YuanChao WangChentao Wu
Zhiyi FangHang YuChanghua XuZ. LiYing JieShaorong Xie
Chuxu ZhangHuaxiu YaoChao HuangMeng JiangZhenhui LiNitesh V. Chawla
Yuyu GuoJingkuan SongLianli GaoHeng Tao Shen