Low-Resource Machine Translation Systems for Indic Languages

Ivana Kvapilíková; Ondřej Bojar

doi:10.18653/v1/2023.wmt-1.90

ScienceGate Book Chapters

JOURNAL ARTICLE

Low-Resource Machine Translation Systems for Indic Languages

Ivana Kvapilíková Ondřej Bojar

Year: 2023 Pages: 954-958

DOI: 10.18653/v1/2023.wmt-1.90

Get Full-Text PDF Get Analytical Report

Abstract

We present our submission to the WMT23 shared task in translation between English and Assamese, Khasi, Mizo and Manipuri. All our systems were pretrained on the task of multilingual masked language modelling and denoising auto-encoding. Our primary systems for translation into English were further pretrained for multilingual MT in all four language directions and fine-tuned on the limited parallel data available for each language pair separately. We used online back-translation for data augmentation. The same systems were submitted as contrastive for translation out of English as the multilingual MT pretraining step seemed to harm the translation performance. Our primary systems for translation out of English were trained without the multilingual MT pretraining step. Other contrastive systems used additional pseudo-parallel data mined from monolingual corpora for pretraining.

Keywords:

Computer science Machine translation Natural language processing Task (project management) Assamese Artificial intelligence Rule-based machine translation Machine translation software usability Machine translation system Example-based machine translation Speech recognition Linguistics Engineering

Metrics

Cited By

0.51

FWCI (Field Weighted Citation Impact)

Refs

0.68

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Low-Resource Machine Translation Systems for Indic Languages

Abstract

Metrics

Citation History

Topics

Related Documents

IACS-LRILT: Machine Translation for Low-Resource Indic Languages

MTNLP-IIITH: Machine Translation for Low-Resource Indic Languages

MUNI-NLP Systems for Low-resource Indic Machine Translation

Low-Resource Indic Languages Translation Using Multilingual Approaches

DLUT-NLP Machine Translation Systems for WMT24 Low-Resource Indic Language Translation