Frequent term based peer-to-peer text clustering

Qing He; Tingting Li; Fuzhen Zhuang; Zhongzhi Shi

doi:10.1109/kam.2010.5646177

ScienceGate Book Chapters

JOURNAL ARTICLE

Frequent term based peer-to-peer text clustering

Qing He Tingting Li Fuzhen Zhuang Zhongzhi Shi

Year: 2010 Pages: 352-355

DOI: 10.1109/kam.2010.5646177

Get Full-Text PDF Get Analytical Report

Abstract

Text clustering is an important technology for automatically structuring large document collections. It is much more valuable in peer-to-peer networks. The high dimensionality of documents means much more communication could be saved if each node could get the approximate clustering result by distributed algorithm instead of transferring them into a center and do the clustering. Most of the existing text clustering algorithms in unstructured peer-to-peer networks are based on K-means algorithm. A problem of those algorithms is that the clustering quality may decreased with the increase of the network size. In this paper, we propose a text clustering algorithm based on frequent term sets for peer-to-peer networks. It requires relatively lower communication volume while achieving a clustering result whose quality will not be affected by the size of the network. Moreover, it gives a term set describing each cluster, which makes it possible for people to have a clear comprehension for the clustering result, and facilitates the users to find resource in the network or manage the local documents in accordance with the whole network.

Keywords:

Cluster analysis Computer science Term (time) Data stream clustering Correlation clustering Data mining Peer-to-peer CURE data clustering algorithm Node (physics) Canopy clustering algorithm Set (abstract data type) Document clustering Artificial intelligence Distributed computing

Metrics

Cited By

0.74

FWCI (Field Weighted Citation Impact)

Refs

0.72

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Peer-to-Peer Network Technologies

Physical Sciences → Computer Science → Computer Networks and Communications

Data Mining Algorithms and Applications

Physical Sciences → Computer Science → Information Systems

Data Management and Algorithms

Physical Sciences → Computer Science → Signal Processing

Frequent term based peer-to-peer text clustering

Abstract

Metrics

Citation History

Topics

Related Documents

Frequent term-based text clustering

Frequent term-based text clustering

Text Clustering for Peer-to-Peer Networks with Probabilistic Guarantees

IP-based Clustering for Peer-to-Peer Overlays

Frequent Term-Based Text Clustering Using Hidden Support