Chinese text classification is an important task in data mining, which extracts category features from unstructured contents. Conventional Chinese text classification models only leverage the surface features in the original text, which omits the potential extensional knowledge of each word. To capture the semantic features of each word more comprehensive, this paper proposed a Chinese news text classification algorithm based on an online knowledge extension and convolutional neural network (OKE-CNN), which leverages both knowledge graph to extend latent semantic information and CNN to obtain the category. Compared with other baseline methods, OKE-CNN can utilize the surface and latent features, simultaneously, which can be adapted to complex scenes, e.g., sparse data and unclear topics. In our experiment, OKE-CNN exhibits superior performance and achieves 97.94% and 87.03% on THUCNews and TouTiao datasets, separately, over SOTA competitors.
Kai-Feng Liu Kai-Feng LiuYu Zhang Kai-Feng LiuQuan-Xin Zhang Yu ZhangYan-Ge Wang Quan-Xin ZhangKai-Long Gao Yan-Ge Wang
Jincheng LiPenghai ZhaoYusheng HaoQiang LinWeilan Wang