JOURNAL ARTICLE

Document Clustering with Map Reduce using Hadoop Framework

Manthan Chelenahalli Satish

Year: 2015 Journal:   International Journal on Recent and Innovation Trends in Computing and Communication Vol: 3 (1)Pages: 409-413

Abstract

Big data is a collection of data sets. It is so enormous and complex that it becomes difficult to processes and analyse using normal database management tools or traditional data processing applications. Big data is having many challenges. The main problem of the big data is store and retrieve of the data from the search engines. Document data is also growing rapidly in the eon of internet. Analysing document data is very important for many applications. Document clustering is the one of the important technique to analyse the document data. It has many applications like organizing large document collection, finding similar documents, recommendation system, duplicate content detection, search optimization. This work is motivated by the reorganization of the need for a well efficient retrieve of the data from massive resources of data repository through the search engines. In this work mainly focused on document clustering for collection of documents in efficient manner using with MapReduce. Keywords—Document Clustering, Map-Reduce, Hadoop, Document pre-processing __________________________________________________*****_________________________________________________

Keywords:
Computer science Cluster analysis Big data Document clustering Information retrieval Data mining The Internet Search engine Database World Wide Web Artificial intelligence

Metrics

9
Cited By
0.63
FWCI (Field Weighted Citation Impact)
9
Refs
0.86
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Data Stream Mining Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Cloud Computing and Resource Management
Physical Sciences →  Computer Science →  Information Systems
Caching and Content Delivery
Physical Sciences →  Computer Science →  Computer Networks and Communications

Related Documents

JOURNAL ARTICLE

Parallel DBSCAN Clustering Algorithm Using Hadoop Map-reduce Framework for Spatial Data

C. Maithri.H. Chandramouli

Journal:   International Journal of Information Technology and Computer Science Year: 2022 Vol: 14 (6)Pages: 1-12
JOURNAL ARTICLE

Hadoop and Map Reduce Biomedical Images using Clustering

Minakshi M. SonawaneSeema S. Kawathekar

Journal:   2017 International Conference on Current Trends in Computer, Electrical, Electronics and Communication (CTCEEC) Year: 2017 Vol: 9 Pages: 945-947
JOURNAL ARTICLE

Referential DNA Data Compression using Hadoop Map Reduce Framework

Raju BhukyaSumit Deshmuk

Journal:   The International Arab Journal of Information Technology Year: 2019 Vol: 17 (2)Pages: 207-214
© 2026 ScienceGate Book Chapters — All rights reserved.