JOURNAL ARTICLE

Exploring Neural Machine Translation for Sinhala-Tamil Languages Pair

Abstract

In the face of rapid globalization, the concept of translation performs the most important role in continuing the existence of native languages. Most of the research on Natural Language Processing in Neural Machine Translation has achieved an impressive result through parallel corpus dataset. Low resourced languages confront low performance due to the lack of parallel corpus data. Creating parallel corpus for language pair is more expensive and needs the persons who are expert knowledge for both languages. In this research, we present the availability of developing the translator for Sinhala-Tamil languages pair using monolingual corpus dataset. In this paper, the Byte Pair Encoding (BPE) is applied for overcoming the Out-Of-Vocabulary (OOV) problem in both Sinhala and Tamil languages. Our first part of the research is using monolingual word embedding approach for developing the translation in between Sinhala-Tamil language pair only using monolingual corpora. The second part of the research we use both parallel and monolingual corpus data with transformer architecture. The BLEU score and the synonyms analysis are used to evaluate the approach we suggested.

Keywords:
Tamil Computer science Natural language processing Machine translation Artificial intelligence Vocabulary Artificial neural network Linguistics

Metrics

5
Cited By
0.29
FWCI (Field Weighted Citation Impact)
39
Refs
0.66
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.