Morphological Word Embeddings for Arabic Neural Machine Translation in Low-Resource Settings

Pamela Shapiro; Kevin Duh

doi:10.18653/v1/w18-1201

ScienceGate Book Chapters

JOURNAL ARTICLE

Morphological Word Embeddings for Arabic Neural Machine Translation in Low-Resource Settings

Pamela Shapiro Kevin Duh

Year: 2018 Pages: 1-11

DOI: 10.18653/v1/w18-1201

Get Full-Text PDF Get Analytical Report

Abstract

Neural machine translation has achieved impressive results in the last few years, but its success has been limited to settings with large amounts of parallel data. One way to improve NMT for lower-resource settings is to initialize a word-based NMT model with pretrained word embeddings. However, rare words still suffer from lower quality word embeddings when trained with standard word-level objectives. We introduce word embeddings that utilize morphological resources, and compare to purely unsupervised alternatives. We work with Arabic, a morphologically rich language with available linguistic resources, and perform Ar-to-En MT experiments on a small corpus of TED subtitles. We find that word embeddings utilizing subword information consistently outperform standard word embeddings on a word similarity task and as initialization of the source word embeddings in a low-resource NMT system.

Keywords:

Word (group theory) Computer science Natural language processing Machine translation Artificial intelligence Initialization Translation (biology) Speech recognition Similarity (geometry) Task (project management) Resource (disambiguation) Arabic Linguistics

Metrics

Cited By

2.38

FWCI (Field Weighted Citation Impact)

Refs

0.89

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Morphological Word Embeddings for Arabic Neural Machine Translation in Low-Resource Settings

Abstract

Metrics

Citation History

Topics

Related Documents

Regressing Word and Sentence Embeddings for Low-Resource Neural Machine Translation

Low-resource neural machine translation with morphological modeling

Dirichlet-Smoothed Word Embeddings for Low-Resource Settings

Augmenting Training Data for Low-Resource Neural Machine Translation via Bilingual Word Embeddings and BERT Language Modelling

From Word Embeddings to Large Vocabulary Neural Machine Translation