JOURNAL ARTICLE

Web Document Clustering by Using Automatic Keyphrase Extraction

Juhyun HanTae‐Hwan KimJoongmin Choi

Year: 2007 Journal:   2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops

Abstract

In most traditional techniques of document clustering, the number of total clusters is not known in advance and the cluster that contain the target information cannot be determined since the semantic nature is not associated with the cluster. The well-known K-means clustering algorithm partially solves these problems by allowing users to specify the number of clusters. However, if the pre-specified number of clusters is modified, the precision of each result also changes. To solve this problem, this paper proposes a new clustering algorithm based on the Kea keyphrase extraction algorithm which returns several keyphrases from the source documents by using some machine learning techniques. In this paper, documents are grouped into several clusters like K-means, but the number of clusters is automatically determined by the algorithm with some heuristics using the extracted keyphrases. Our Kea-means clustering algorithm provides easy and efficient ways to extract test documents from massive quantities of resources.

Keywords:
Computer science Cluster analysis Artificial intelligence Extraction (chemistry) Information retrieval

Metrics

1
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.39
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Web Document Clustering by Using Automatic Keyphrase Extraction

Juhyun HanTae-Hwan KimJoongmin Choi

Journal:   2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops Year: 2007 Pages: 56-59
JOURNAL ARTICLE

Automatic Multi-Document Arabic Text Summarization Using Clustering and Keyphrase Extraction

Hamzah Noori FejerNazlia Omar

Journal:   Journal of Artificial Intelligence Year: 2014 Vol: 8 (1)Pages: 1-9
BOOK-CHAPTER

CorePhrase: Keyphrase Extraction for Document Clustering

Khaled M. HammoudaDiego N. MatuteMohamed S. Kamel

Lecture notes in computer science Year: 2005 Pages: 265-274
© 2026 ScienceGate Book Chapters — All rights reserved.