In this paper, we propose a novel 3D scene graph generation model, L3DSG, which can make use of rich prior knowledge obtained from large language model (LLM) by prompt engineering. The proposed model is built upon our previous 3D scene graph generation model, C3DSG, that adopts Point Transformer as 3D geometric feature extractor and uses the NE-GAT graph neural network as context reasoner. The new proposed model addresses the inability of C3DSG to utilize prior knowledge on indoor physical environments. It focuses on issues of how to obtain prior knowledge from LLM and how to make use of it for predicting objects and their relations effectively. The proposed model is extended from C3DSG by adding several elaborate modules to prompt, encode, and fuse prior knowledge from LLM. Through various experiments using the benchmark dataset 3DSSG, we show the superiority of the proposed model.
Zoltán JeskóTuan-anh TranGergely HalászJános AbonyiTamás Ruppert
Yuelin HuFutai ZouJiajia HanXin SunYilei Wang
Yuhao LiuJunjie HouYuxuan ChenJie JinWenyue Wang
Weixin ChenYongyong ChenShichao Kan