Short Text Embedding for Clustering Based on Word and Topic Semantic Information

Ziheng Chen; Jiangtao Ren

doi:10.1109/dsaa.2019.00020

ScienceGate Book Chapters

JOURNAL ARTICLE

Short Text Embedding for Clustering Based on Word and Topic Semantic Information

Ziheng Chen Jiangtao Ren

Year: 2019 Vol: 333 Pages: 61-70

DOI: 10.1109/dsaa.2019.00020

Get Full-Text PDF Get Analytical Report

Abstract

Short text clustering is used in various applications and becomes a significant problem, while it also is a challenging task due to the sparsity problem of traditional short text representations. Early methods either cause waste of space or ignore the order of word sequence. To tackle these problems, a self-taught convolutional neural network model is proposed to construct short text representations. However, it extracts the semantic information only from the word context without any other unsupervised features and ignores the different contributions of textual content in clustering. In this paper, we propose an effective short text embedding method for clustering based on word and topic semantic information (STE-WT). Taking advantage of the topic semantic information and capturing the differences in the contributions of the content by an attention mechanism, our proposed model successfully constructs much better short text representations for clustering. Extensive experimental results on real datasets demonstrate the effectiveness and superiority of our framework compared with state-of-the-art methods.

Keywords:

Computer science Cluster analysis Word embedding Artificial intelligence Natural language processing Word (group theory) Document clustering Context (archaeology) Construct (python library) Task (project management) Embedding Information retrieval Mathematics

Metrics

Cited By

0.31

FWCI (Field Weighted Citation Impact)

Refs

0.68

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Text and Document Classification Technologies

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Text Analysis Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Short Text Embedding for Clustering Based on Word and Topic Semantic Information

Abstract

Metrics

Citation History

Topics

Related Documents

Short Text Clustering Based on Word Semantic Graph with Word Embedding Model

Topic word set-based text clustering

Probabilistic topic modeling for short text based on word embedding networks

Short Text Classification Based on Latent Topic Modeling and Word Embedding

Text Semantic Steganalysis Based on Word Embedding