JOURNAL ARTICLE

Short Text Classification Based on Latent Topic Modeling and Word Embedding

Peng LiJunqing HeChenglong Ma

Year: 2017 Journal:   DEStech Transactions on Computer Science and Engineering   Publisher: Destech Publications

Abstract

With the rapid development of the social network and e-commerce, we are exposed to enormous short text every day, ranging from twitters, movie comments, search snippets to news summaries. To classify the short and sparse text accurately is always the basic need for us to deal with information efficiently. However, previous methods fail to achieve high performance due to the sparseness and meaningless of the representation of text. The key breakout lies on the appropriate representation of the words, on which we excogitate a new framework. By discovering the latent topics in the related data crawled from the web, topic distribution can describe the text content in general. Combining with the word embedding generated from the online universal data, the proposed method is a more dense representation, containing semantic information from two different aspects. With this semantic representation of the texts, this framework greatly outperform the previous methods even using the most common SVM classifier, improving the accuracy by 11.58% on standard data set.

Keywords:
Computer science Word embedding Embedding Breakout Information retrieval Representation (politics) Artificial intelligence Word (group theory) Classifier (UML) Probabilistic latent semantic analysis Natural language processing Set (abstract data type) Key (lock) Latent semantic analysis Data set Support vector machine

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
14
Refs
0.02
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Web Data Mining and Analysis
Physical Sciences →  Computer Science →  Information Systems
© 2026 ScienceGate Book Chapters — All rights reserved.