JOURNAL ARTICLE

AmhEn: Amharic-English Large Parallel Corpus for Machine Translation

Abstract

Abstract Recently, using deep neural networks for machine translation (MT) tasks has received great attention. In order for these networks to learn abstract representations of the input and store them as continuous vectors, they need a lot of data. However, very few research studies have been conducted on low-resource languages like Amharic. The progress of an Amharic-English machine translation task in both directions is affected by the lack of clean, easy-to-find, and up-to-date parallel language corpora. This paper presents the first relatively large-scale Amharic-English parallel corpora (above 1.1 million) for machine translation tasks. We ran experiments with recurrent neural networks (RNN) and Transformer in various hyper-parameter settings to investigate the usability of our dataset. Additionally, we explore the effects of Amharic homophone character normalization on machine translation. We have released the dataset in both unnormalized and normalized forms. Our dataset is available in train, test, and validation split files.

Keywords:
Machine translation Normalization (sociology) Transformer Example-based machine translation Artificial neural network Parallel corpora Task (project management) Usability Deep learning

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.28
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Language, Linguistics, Cultural Analysis
Social Sciences →  Arts and Humanities →  Language and Linguistics
Translation Studies and Practices
Social Sciences →  Arts and Humanities →  Language and Linguistics

Related Documents

JOURNAL ARTICLE

Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Translation

Ibrahim GashawShashirekha

Journal:   International Journal of Artificial Intelligence & Applications Year: 2020 Vol: 11 (1)Pages: 79-91
JOURNAL ARTICLE

CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSLATION

Ibrahim Gashaw and H L Shashirekha

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2020
JOURNAL ARTICLE

CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSLATION

Gashaw, Ibrahim

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2020
JOURNAL ARTICLE

Amharic-English Parallel Corpus

Gezmu, Andargachew MekonnenNürnberger, AndreasBati, Tesfaye Bayu

Journal:   Otto-von-Guericke-Universität Magdeburg Year: 2018
© 2026 ScienceGate Book Chapters — All rights reserved.