Graph neural networks(GNNs) in heterogeneous graph has shown superior performance and attracted considerable research interest. However, many applications require GNNs to make predictions on test examples that are distributionally different from training ones, while task-specific labeled data is often arduously expensive to obtain. An effective approach to this challenge is to pre-train an expressive GNN model on unlabeled data, and then fine-tune it on a downstream task of interest. While pre-training has been demonstrated effectively in homogeneous graph, it remains an open question to pre-train a GNN in heterogeneous graph as it contains different types of nodes and edges, which leads to new challenges on structure heterogeneity for graph pre-training. To capture the structural and semantic properties of heterogeneous graphs simultaneously, in this paper, we develop a new strategy for Pre-training Heterogeneous Graph Neural Networks(PHGNN). The key to the success of PHGNN is that PHGNN innovatively proposed to use two different tasks to capture two kinds of similarities in heterogeneous graph: the similarities between nodes with the same type and the similarities between nodes with the different type. In addition, PHGNN proposed an attribute type prediction task to preserve node attributes information. We systematically study pre-training on two real-world heterogeneous graphs. The results demonstrate that PHGNN improves generalization significantly across downstream tasks.
Hangjun YangLinsen LiLingxuan ZhangJunhua TangZhongwei Chen
Yupeng HouBinbin HuWayne Xin ZhaoZhiqiang ZhangJun ZhouJi-Rong Wen
WU Jiawei, FANG Quan, HU Jun, QIAN Shengsheng
Kejia ChenJiajun ZhangLinpu JiangYunyun WangYuxuan Dai
Weihua HuBowen LiuJoseph GomesMarinka ŽitnikPercy LiangVijay S. PandeJure Leskovec