Neural Machine Translation (NMT) is becoming the current state of the art machine translation technique. Although NMT is successful for resourceful languages, its applicability in low-resource settings is still debatable. In this paper, we address the task of developing a NMT system for the most widely used language pair in Sri Lanka-Sinhala and Tamil, focusing on the domain of official government documents. We explore the ways of improving NMT using word phrases in a situation where the size of the parallel corpus is considerably small, and empirically show that the resulting models improve our benchmark domain specific Sinhala to Tamil and Tamil to Sinhala translation models by 0.68 and 5.4 BLEU, respectively. The paper also presents an analysis on how NMT performance varies with the amount of word phrases, in order to investigate the effects of word phrases in domain specific NMT.
L. N. A. S. H. NissankaB. H. R. PushpanandaA. R. Weerasinghe
Randil PushpanandaRuvan WeerasingheMahesan Niranjan
W.S.N. DilshaniS. YashotharaR. T. UthayasankerSanath Jayasena
Thilakshi FonsekaRashmini NaranpanawaRavinga PereraUthayasanker Thayasivam