JOURNAL ARTICLE

A Text Document Clustering Method Based on Weighted BERT Model

Yutong LiJuanjuan CaiJingling Wang

Year: 2020 Journal:   2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC)

Abstract

Traditional text document clustering methods represent documents with uncontextualized word embeddings and vector space model, which neglect the polysemy and the semantic relation between words. This paper presents a novel text document clustering method to deal with these problems. Firstly, pre-trained language representation model Bidirectional Encoder Representations from Transformers (BERT) is utilized to generate sentence embeddings. Then, two sentence-level weighting schemes based on named entity are designed to enhance the performance. Finally, the k-means clustering algorithm is applied to find groups of similar documents. Experimental results on four datasets indicate that the proposed weighted method achieves higher accuracy than unweighted average method. Friedman tests conducted separately with F1 score and Adjusted Rand Index (ARI) values both validate better overall performance of our proposed method.

Keywords:
Computer science Cluster analysis Sentence Artificial intelligence Polysemy Weighting Vector space model Document clustering Natural language processing Encoder Word (group theory) Language model Data mining Mathematics

Metrics

25
Cited By
1.43
FWCI (Field Weighted Citation Impact)
18
Refs
0.85
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

A weighted topical document embedding based clustering method for news text

Hui SongZhu Dechao

Journal:   2016 IEEE Information Technology, Networking, Electronic and Automation Control Conference Year: 2016 Vol: 13 Pages: 1060-1065
BOOK-CHAPTER

A Text Document Clustering Method Based on Ontology

Yi DingXian Fu

Lecture notes in computer science Year: 2011 Pages: 199-206
JOURNAL ARTICLE

Sentiment recognition and analysis method of official document text based on BERT–SVM model

Shule HaoPeng ZhangSen LiuYuhang Wang

Journal:   Neural Computing and Applications Year: 2023 Vol: 35 (35)Pages: 24621-24632
JOURNAL ARTICLE

Improved Meta-Heuristic Model for Text Document Clustering by Adaptive Weighted Similarity

Gugulothu VenkannaK. F. Bharati

Journal:   International Journal of Uncertainty Fuzziness and Knowledge-Based Systems Year: 2023 Vol: 31 (05)Pages: 749-771
© 2026 ScienceGate Book Chapters — All rights reserved.