Text Document Categorization using Modified K-Means Clustering Algorithm

Mehdi Allahyari; Seyedamin Pouriyeh; Mehdi Assefi; Saied Safaei; Elizabeth Trippe; Juan Gutierrez; Krys Kochut; M Berry; A Jain; M Ng; J Huang; L Jing; A Huang; J Neuhaus; Kalbfleisch; R Tibshirani; G Walther; T Hastie; J Rasson; T Kubushishi; D Pham; S Dimov; C Nguyen; M Revanasiddappa; B Harish; S Nasser; C Sreejith; M Irshad; S Lata; M Loar; R Bahsoon; T Chen; K Li; R Buyya; A Vashishtha; S Kumar; P Verma; R Porwal

doi:10.35940/ijrte.b1095.0782s719

JOURNAL ARTICLE

Text Document Categorization using Modified K-Means Clustering Algorithm

Mehdi Allahyari Seyedamin Pouriyeh Mehdi Assefi Saied Safaei Elizabeth Trippe Juan Gutierrez Krys Kochut M Berry A Jain M Ng J Huang L Jing A Huang J Neuhaus Kalbfleisch R Tibshirani G Walther T Hastie J Rasson T Kubushishi D Pham S Dimov C Nguyen M Revanasiddappa B Harish S Nasser C Sreejith M Irshad S Lata M Loar S Kumar R Bahsoon T Chen R Buyya S Kumar R Bahsoon T Chen K Li R Buyya A Vashishtha S Kumar P Verma R Porwal

Year: 2019 Journal: International Journal of Recent Technology and Engineering (IJRTE) Vol: 8 (2S7)Pages: 508-511

DOI: 10.35940/ijrte.b1095.0782s719

Get Full-Text PDF Get Analytical Report

Abstract

The volume of the information that is to be managed is increasing at exponential pace. The challenge arises how to manage this large data effectively. There are many parameters on which the performance of such a system can be measured such as time to retrieve the data, similarity of documents placed in same cluster etc. The paper presents an approach for auto-document categorization using a modified k-means. The proposed methodology has been tested on three different data sets. Experimental findings suggest that proposed methodology is accurate and robust for creating accurate clusters of documents. The proposed methodology uses cosine similarity measure and a fuzzy k-means clustering approach to yield the results very fast and accurately.

Keywords:

Cluster analysis Cosine similarity Categorization Computer science Data mining Similarity (geometry) Pace Document clustering Measure (data warehouse) Fuzzy clustering Text categorization Similarity measure k-means clustering Volume (thermodynamics) Cluster (spacecraft) Fuzzy logic Pattern recognition (psychology) Artificial intelligence Algorithm

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.32

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Algorithms and Applications

Physical Sciences → Engineering → Control and Systems Engineering

Wireless Sensor Networks and IoT

Physical Sciences → Engineering → Control and Systems Engineering

Advanced Sensor and Control Systems

Physical Sciences → Engineering → Control and Systems Engineering

Text Document Categorization using Modified K-Means Clustering Algorithm

Abstract

Metrics

Citation History

Topics

Related Documents

Document Clustering Algorithm using Modified K-Means

An Approach for Text Clustering Using Modified K-means Algorithm

Text document clustering using mayfly optimization algorithm with k-means technique

Text Document Clustering Using K-means Algorithm with Dimension Reduction Techniques

Improved Document Clustering using k-means algorithm