JOURNAL ARTICLE

Named entity recognition for Sinhala language

Abstract

Named Entity Recognition (NER) is one of the major subtasks that have to be solved in most Natural Language Processing related tasks. However it is very much challenging to build a proper Named Entity Recognition system especially for Indic languages such as Sinhala because of the language features it inherits such as the absence of capitalization. Since there has not been much previous work based on NER for Sinhala, the concept and the needed resources have to be built from scratch. This paper tries to find out the effectiveness of using data-driven techniques to detect Named Entities in Sinhala text. Conditional Random Fields (CRF) and Maximum Entropy (ME) model were applied to this task. It is found that the former outperformed the latter in all experiments. A CRF model is able to detect Sinhala Named Entities with a very high precision (91.64%) and reasonable recall (69.34%) rates.

Keywords:
Named-entity recognition Computer science Conditional random field Natural language processing Artificial intelligence Task (project management) Named entity Recall rate Language model Recall Natural language Entropy (arrow of time) Principle of maximum entropy Precision and recall Entity linking Information retrieval Linguistics

Metrics

15
Cited By
1.93
FWCI (Field Weighted Citation Impact)
19
Refs
0.88
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.