The paper addresses a critical gap in the existing framework for handling the massive amount of data in the Sinhala news and articles domain. The identified problem revolves around the absence of dedicated tools or frameworks for Sinhala news article clustering based on Named Entity Recognition (NER), a pivotal aspect in managing massive data efficiently. Existing systems struggle to process extensive information and provide streamlined services, including reducing duplication, content summarization, and improved search capabilities. To bridge this gap, the paper introduces a framework designed to cluster Sinhala news articles based on identified named entities within each article in the collection. The study also presents a comparative analysis of several deep learning NER approaches, evaluating their impact on Sinhala language clustering. In contrast to the conventional deployment of individual deep learning techniques, the research findings highlight the potential of a combined approach, showcasing heightened clustering accuracy. The research contributes not only to understanding the efficacy of NER in Sinhala news article clustering but also to the broader discourse on optimizing information retrieval systems in languages that have received limited research attention. In conclusion, this research advocates for the adoption of integrated deep-learning approaches to significantly enhance the clustering of Sinhala news articles. It serves as a valuable resource for researchers, practitioners, and developers in the field of natural language processing.
Senthamizh Selvan. RK. Arutchelvan
J. K. DahanayakaA. R. Weerasinghe
Rameela AzeezSurangika Ranathunga