Sinhala Document Clustering Using Named Entity Recognition Technique

T. D. C. Peiris; Dinesh Asanka

doi:10.1109/icarc61713.2024.10499763

ScienceGate Book Chapters

JOURNAL ARTICLE

Sinhala Document Clustering Using Named Entity Recognition Technique

T. D. C. Peiris Dinesh Asanka

Year: 2024 Pages: 179-183

DOI: 10.1109/icarc61713.2024.10499763

Get Full-Text PDF Get Analytical Report

Abstract

The paper addresses a critical gap in the existing framework for handling the massive amount of data in the Sinhala news and articles domain. The identified problem revolves around the absence of dedicated tools or frameworks for Sinhala news article clustering based on Named Entity Recognition (NER), a pivotal aspect in managing massive data efficiently. Existing systems struggle to process extensive information and provide streamlined services, including reducing duplication, content summarization, and improved search capabilities. To bridge this gap, the paper introduces a framework designed to cluster Sinhala news articles based on identified named entities within each article in the collection. The study also presents a comparative analysis of several deep learning NER approaches, evaluating their impact on Sinhala language clustering. In contrast to the conventional deployment of individual deep learning techniques, the research findings highlight the potential of a combined approach, showcasing heightened clustering accuracy. The research contributes not only to understanding the efficacy of NER in Sinhala news article clustering but also to the broader discourse on optimizing information retrieval systems in languages that have received limited research attention. In conclusion, this research advocates for the adoption of integrated deep-learning approaches to significantly enhance the clustering of Sinhala news articles. It serves as a valuable resource for researchers, practitioners, and developers in the field of natural language processing.

Keywords:

Computer science Cluster analysis Named-entity recognition Artificial intelligence Natural language processing Document clustering Information retrieval Entity linking Pattern recognition (psychology) Engineering Task (project management)

Metrics

Cited By

0.64

FWCI (Field Weighted Citation Impact)

Refs

0.64

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Web Data Mining and Analysis

Physical Sciences → Computer Science → Information Systems

Advanced Text Analysis Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Sinhala Document Clustering Using Named Entity Recognition Technique

Abstract

Metrics

Citation History

Topics

Related Documents

Automatic Text Summarization using Document Clustering Named Entity Recognition

Named entity recognition for Sinhala language

Fine-Grained Named Entity Recognition for Sinhala

Named Entity Recognition Using Web Document Corpus

Named Entity Recognition Using Web Document Corpus