Rameela AzeezSurangika Ranathunga
For English, Named Entity Recognition (NER) is more or less a solved problem. However, for low-resourced and morphologically rich languages such as Sinhala, minimal research has been done. In this paper, we present a novel fine-grained Named Entity (NE) tag set and an NE annotated Sinhala corpus of 70k word tokens. We trained a custom NER model for Sinhala based on Conditional Random Fields (CRF). Despite the low-resourced setting, this NER model could achieve an micro-averaged F1 score of 84.8.
Viorica-Camelia LupancuAdrian Iftene
Lola KhudoyberdievaBanu Di̇ri̇
Jintao TangChengxian ZhangShasha LiTing Wang
Huiming ZhuChunhui HeYang FangWeidong Xiao