Short text clustering is used in various applications and becomes a significant problem, while it also is a challenging task due to the sparsity problem of traditional short text representations. Early methods either cause waste of space or ignore the order of word sequence. To tackle these problems, a self-taught convolutional neural network model is proposed to construct short text representations. However, it extracts the semantic information only from the word context without any other unsupervised features and ignores the different contributions of textual content in clustering. In this paper, we propose an effective short text embedding method for clustering based on word and topic semantic information (STE-WT). Taking advantage of the topic semantic information and capturing the differences in the contributions of the content by an attention mechanism, our proposed model successfully constructs much better short text representations for clustering. Extensive experimental results on real datasets demonstrate the effectiveness and superiority of our framework compared with state-of-the-art methods.
Supakpong JinaratBundit ManaskasemsakArnon Rungsawang
Amir Mehdi GhazifardMohammadreza ShamsZeinab Shamaee
Marcelo PitaMatheus NunesGisele L. Pappa
Xin ZuoHuanhuan HuWeiming ZhangNenghai Yu