With the rapid development of the Internet, on-line news text data is growing explosively. Traditional manual annotation methods are no longer sufficient for the current order of magnitude of news classification tasks. Most of the current news text classification tasks use the pre-trained language model BERT for contextual semantic information extraction and achieve good results. However, BERT is unable to make full use of the structural features of the text. In contrast, Graph Convolutional Network (GCN) has a natural advantage in modeling the structural information of the text. In order to comprehensively utilize the structural information and semantic information of news texts to improve the accuracy of news text classification, this paper proposes a BERT-Enhanced Graph Convolutional Network model (BEGCN) to classify news text. For each news text, the text graph is constructed based on word co-occurrence, a semantic dictionary is introduced to enrich the construction of the text graph, and text structure features are extracted using GCN. In addition, semantic features are extracted using BERT. These two different granularity features are interacted through the multi-head attention module, and three aggregation methods are used for feature aggregation so that they can influence each other to obtain a final more effective representation. The experiments can be proved the effectiveness of the proposed model on three text datasets of news categories.
Mo ChenChunlong YaoXu LiLan Shen
Loc TranLam PhamTuan TranAn Mai
Bingxin XueCui ZhuXuan WangWenjun Zhu