JOURNAL ARTICLE

An Efficient Concept-Based Mining Model for Enhancing Text Clustering

Shady ShehataFakhri KarrayMohamed S. Kamel

Year: 2009 Journal:   IEEE Transactions on Knowledge and Data Engineering Vol: 22 (10)Pages: 1360-1371   Publisher: IEEE Computer Society

Abstract

Most of the common techniques in text mining are based on the statistical analysis of a term, either word or phrase. Statistical analysis of a term frequency captures the importance of the term within a document only. However, two terms can have the same frequency in their documents, but one term contributes more to the meaning of its sentences than the other term. Thus, the underlying text mining model should indicate terms that capture the semantics of text. In this case, the mining model can capture terms that present the concepts of the sentence, which leads to discovery of the topic of the document. A new concept-based mining model that analyzes terms on the sentence, document, and corpus levels is introduced. The concept-based mining model can effectively discriminate between nonimportant terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning. The proposed mining model consists of sentence-based concept analysis, document-based concept analysis, corpus-based concept-analysis, and concept-based similarity measure. The term which contributes to the sentence semantics is analyzed on the sentence, document, and corpus levels rather than the traditional analysis of the document only. The proposed model can efficiently find significant matching concepts between documents, according to the semantics of their sentences. The similarity between documents is calculated based on a new concept-based similarity measure. The proposed similarity measure takes full advantage of using the concept analysis measures on the sentence, document, and corpus levels in calculating the similarity between documents. Large sets of experiments using the proposed concept-based mining model on different data sets in text clustering are conducted. The experiments demonstrate extensive comparison between the concept-based analysis and the traditional analysis. Experimental results demonstrate the substantial enhancement of the clustering quality using the sentence-based, document-based, corpus-based, and combined approach concept analysis.

Keywords:
Computer science Sentence Natural language processing Phrase Term (time) Similarity (geometry) Semantics (computer science) Cluster analysis Artificial intelligence Document clustering Meaning (existential) Word (group theory) Measure (data warehouse) Information retrieval Similarity measure Semantic similarity Data mining Linguistics

Metrics

131
Cited By
18.07
FWCI (Field Weighted Citation Impact)
41
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Web Data Mining and Analysis
Physical Sciences →  Computer Science →  Information Systems
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Enhancing Text Clustering Using Concept-based Mining Model

Shady ShehataFakhri KarrayMohamed S. Kamel

Journal:   Proceedings Year: 2006 Pages: 1043-1048
JOURNAL ARTICLE

TEXT CLUSTERING IN CONCEPT BASED MINING

Pradnya RandiveNitin Pise

Journal:   International Journal of Computer and Communication Technology Year: 2016 Pages: 32-34
JOURNAL ARTICLE

Concept Based Mining in Text Clustering

Pradnya RandiveNitin Pise

Journal:   International Journal Of Recent Advances in Engineering & Technology Year: 2020 Vol: 08 (03)Pages: 1-4
JOURNAL ARTICLE

An Efficient Concept Based Mining Model for Web Page Clustering

Mr. P. S GamareMr. Sandip B. KhedkarMr. Maheshwar A. PanindreMr. Ketan D. Bhatkar

Journal:   International Journal of Engineering Trends and Technology Year: 2015 Vol: 21 (4)Pages: 219-221
JOURNAL ARTICLE

An efficient concept-based retrieval model for enhancing text retrieval quality

Shady ShehataFakhri KarrayMohamed S. Kamel

Journal:   Knowledge and Information Systems Year: 2012 Vol: 35 (2)Pages: 411-434
© 2026 ScienceGate Book Chapters — All rights reserved.