Attention-based machine translation using monolingual data

Rosendahl, Jan

doi:10.18154/rwth-2024-05649

ScienceGate Book Chapters

JOURNAL ARTICLE

Attention-based machine translation using monolingual data

Rosendahl, Jan

Year: 2024 Journal: RWTH Publications (RWTH Aachen) Publisher: RWTH Aachen University

DOI: 10.18154/rwth-2024-05649

Get Full-Text PDF Get Analytical Report

Abstract

Neural networks present a major advance in modeling for statistical machine translation systems. These data-driven systems consist of an encoder that computes a representation of the source sentence and a decoder that accesses the encoder output and generates a probability distribution over all target sentences. The components are connected via a cross-attention layer and trained jointly to minimize the cross-entropy loss on a corpus of bilingual training data, i.e. a set of sentence pairs where one is the translation of the other. In this dissertation, we focus on two important aspects of neural machine translation systems, namely the training data and the attention layer. Since sentence-aligned bilingual data is a scarce resource and availability depends on the language pair, we investigate the use of monolingual data to improve the performance of the machine translation system. We verify the results reported with the use of synthetic data (back-translation) and extended language model fusion and introduce pre-training to neural machine translation. Using a language model trained on monolingual target data is an established method in count-based machine translation approaches. We adapt this to neural machine translation and extend the approach by training the parameters of the translation model as part of a greater fusion model. Furthermore, we use monolingual source and target data to find a better initialization for the training. This pre-training also allows the use of monolingual source data, which is commonly ignored in machine translation systems. We evaluate these methods empirically on four different language pairs with different data conditions and report improvements for all described methods over a purely bilingual baseline. Overall, back-translation provides the best results with respect to translation performance and data efficiency. Inspired by existing work on alignment models, we also incorporate a first-order dependency in the attention layer. In contrast with previous machine translation models, the transformer is a purely feed-forward model without any recurrent layers. This means that no information about the previous attention decision is input to the computation of the attention layer. Modeling attention with first-order dependencies allows the attention layer to access previous attention decisions, which is an important prerequisite to express, e.g. source coverage. We adapt and propose several extensions to include this time-dependent information. Interpreting attention as a soft-lookup of a query to a list of key-value pairs, we introduce the previous attention information in different ways and using different encodings. All methods are verified on several machine translation tasks and we conclude that a zero-order attention model is sufficiently strong for the task of machine translation.

Keywords:

Machine translation Example-based machine translation Initialization Sentence Translation (biology) Transfer-based machine translation Language model Machine translation software usability Encoder

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.42

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Attention-based machine translation using monolingual data

Abstract

Metrics

Topics

Related Documents

Improving Simultaneous Machine Translation with Monolingual Data

Enhancement of Encoder and Attention Using Target Monolingual Corpora in Neural Machine Translation

Improved Statistical Machine Translation Using Monolingual Paraphrases

Query Rewriting Using Monolingual Statistical Machine Translation

Low-Resource Neural Machine Translation Improvement Using Source-Side Monolingual Data