JOURNAL ARTICLE

A weighted topical document embedding based clustering method for news text

Hui SongZhu Dechao

Year: 2016 Journal:   2016 IEEE Information Technology, Networking, Electronic and Automation Control Conference Vol: 13 Pages: 1060-1065

Abstract

As an unsupervised machine learning method, clustering can preliminarily group text without artificial labeling, which effectively accelerates the organization, abstraction and navigation on large news set. The length of news is long, and the text contains many homonymy and polysemy, that is one of the reason that traditional text clustering methods perform weaker on grouping news text. This paper presents a novel text representation method based on topical document embedding (TDE) to capture the semantic features of different topics. In TDE representation, document embedding of news texts is obtained by adding up word vector from Skip-Gram model weighted by TF-IDF score of all the key words in the text. While the topical document embedding is learned by joining the topic vectors obtained from LDA model and the document vectors in document embedding. By using topical document embedding to perform clustering, we implement a novel text clustering method (TDE-TC). The experimental results show that the effect of news clustering based on TDE representation is better than that of bag of words model and LDA model.

Keywords:
Computer science Cluster analysis Document clustering Embedding Polysemy Representation (politics) Artificial intelligence Word embedding Information retrieval Set (abstract data type) Word (group theory) Natural language processing Abstraction Pattern recognition (psychology) Mathematics

Metrics

4
Cited By
0.19
FWCI (Field Weighted Citation Impact)
15
Refs
0.45
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Web Data Mining and Analysis
Physical Sciences →  Computer Science →  Information Systems

Related Documents

BOOK-CHAPTER

A Text Document Clustering Method Based on Topical Concept

Yi DingXian Fu

Advances in intelligent and soft computing Year: 2012 Pages: 547-552
JOURNAL ARTICLE

A Text Document Clustering Method Based on Weighted BERT Model

Yutong LiJuanjuan CaiJingling Wang

Journal:   2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC) Year: 2020
JOURNAL ARTICLE

Topical Concept Based Text Clustering Method

Yi DingXian Fu

Journal:   Advanced materials research Year: 2012 Vol: 532-533 Pages: 939-943
JOURNAL ARTICLE

GAE-Based Document Embedding Method for Clustering

Sungwon JungSangmin Ka

Journal:   IEEE Access Year: 2022 Vol: 10 Pages: 130089-130096
© 2026 ScienceGate Book Chapters — All rights reserved.