JOURNAL ARTICLE

Text Document Clustering Using Modified Particle Swarm Optimization with k-means Model

Ratnam DoddaA. Suresh Babu

Year: 2023 Journal:   International Journal of Artificial Intelligence Tools Vol: 33 (01)   Publisher: World Scientific

Abstract

In the present digital era, vast amounts of data are generated by millions of Internet users in the form of unstructured text documents. The clustering and organizing of text documents play a crucial role in the applications of data analysis and market research. In this research manuscript, a new modified version of metaheuristic-based optimization technique is proposed with k-means for clustering the text documents. In the initial phase, the input data are acquired from the three-benchmark databases such as Reuters-21578, 20-Newsgroup and British Broadcasting Corporation (BBC)-sport. Further, the data denoising is accomplished by using the common techniques: stemming, lemmatization, tokenization, and stop word removal. In addition to this, the denoised data are transformed into feature vectors by utilizing Term Frequency (TF)-Inverse Document Frequency (IDF) technique. The computed feature vectors are given to the Modified Particle Swarm Optimization (MPSO) with k-means to group the closely related text documents by minimizing the similarity in different clusters. The experimental examination showed that the proposed MPSO with k-means model achieved accuracy of 0.85, 0.85 and 0.86 on the Reuters-21578, 20-Newsgroup and BBC-sport databases, which are superior to the comparative models.

Keywords:
tf–idf Particle swarm optimization Computer science Cluster analysis Lexical analysis Data mining Document clustering Metaheuristic The Internet Feature (linguistics) Artificial intelligence Information retrieval Machine learning Term (time) World Wide Web

Metrics

13
Cited By
8.04
FWCI (Field Weighted Citation Impact)
40
Refs
0.97
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Data Mining and Machine Learning Applications
Physical Sciences →  Computer Science →  Information Systems
Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Multimedia Learning Systems
Physical Sciences →  Computer Science →  Information Systems
© 2026 ScienceGate Book Chapters — All rights reserved.