JOURNAL ARTICLE

Text clustering algorithm based on deep representation learning

Binyu WangWenfen LiuZijie LinXuexian HuJianghong WeiChun Liu

Year: 2018 Journal:   The Journal of Engineering Vol: 2018 (16)Pages: 1407-1414   Publisher: Institution of Engineering and Technology

Abstract

Text clustering is an important method for effectively organising, summarising, and navigating text information. However, in the absence of labels, the text data to be clustered cannot be used to train the text representation model based on deep learning. To address the problem, an algorithm of text clustering based on deep representation learning is proposed using the transfer learning domain adaptation and the parameters update during cluster iteration. First, source domain data is used to perform the pre‐training of the deep learning classification model. This procedure acts as an initialisation of the model parameters. Then, the domain discriminator is added to the model, to domain‐divide the input sample. If the discriminator cannot distinguish which domain the data belongs to, the common feature space of two domains is obtained, so the domain adaptation problem is solved. Finally, the text feature vectors obtained by the model are clustered with MCSKM++ algorithm. The algorithm not only resolves the model pre‐training problem in unsupervised clustering, but also has a good clustering effect on the transfer problem caused by different numbers of domain labels. Experiments suggest that the clustering accuracy of the algorithm is superior to other similar algorithms.

Keywords:
Cluster analysis Computer science Discriminator Artificial intelligence Representation (politics) Domain (mathematical analysis) Feature learning Pattern recognition (psychology) Feature (linguistics) Correlation clustering Domain adaptation Canopy clustering algorithm Deep learning Transfer of learning Algorithm Machine learning Mathematics

Metrics

13
Cited By
1.39
FWCI (Field Weighted Citation Impact)
20
Refs
0.84
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Sentiment Analysis and Opinion Mining
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.