Named Entity Recognition (NER) is one of the major subtasks that have to be solved in most Natural Language Processing related tasks. However it is very much challenging to build a proper Named Entity Recognition system especially for Indic languages such as Sinhala because of the language features it inherits such as the absence of capitalization. Since there has not been much previous work based on NER for Sinhala, the concept and the needed resources have to be built from scratch. This paper tries to find out the effectiveness of using data-driven techniques to detect Named Entities in Sinhala text. Conditional Random Fields (CRF) and Maximum Entropy (ME) model were applied to this task. It is found that the former outperformed the latter in all experiments. A CRF model is able to detect Sinhala Named Entities with a very high precision (91.64%) and reasonable recall (69.34%) rates.
Rameela AzeezSurangika Ranathunga
W.M.S.K. WijesingheMuditha Tissera