Abstract

The volume of the information that is to be managed is increasing at exponential pace. The challenge arises how to manage this large data effectively. There are many parameters on which the performance of such a system can be measured such as time to retrieve the data, similarity of documents placed in same cluster etc. The paper presents an approach for auto-document categorization using a modified k-means. The proposed methodology has been tested on three different data sets. Experimental findings suggest that proposed methodology is accurate and robust for creating accurate clusters of documents. The proposed methodology uses cosine similarity measure and a fuzzy k-means clustering approach to yield the results very fast and accurately.

Keywords:
Cluster analysis Cosine similarity Categorization Computer science Data mining Similarity (geometry) Pace Document clustering Measure (data warehouse) Fuzzy clustering Text categorization Similarity measure k-means clustering Volume (thermodynamics) Cluster (spacecraft) Fuzzy logic Pattern recognition (psychology) Artificial intelligence Algorithm

Metrics

3
Cited By
0.00
FWCI (Field Weighted Citation Impact)
16
Refs
0.32
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Algorithms and Applications
Physical Sciences →  Engineering →  Control and Systems Engineering
Wireless Sensor Networks and IoT
Physical Sciences →  Engineering →  Control and Systems Engineering
Advanced Sensor and Control Systems
Physical Sciences →  Engineering →  Control and Systems Engineering

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.