JOURNAL ARTICLE

Lexical Chains meet Word Embeddings in Document-level Statistical Machine Translation

Abstract

Currently under review for EMNLP 2017 The phrase-based Statistical Machine Translation (SMT) approach deals with sentences in isolation, making it difficult to consider discourse context in translation. This poses a challenge for ambiguous words that need discourse knowledge to be correctly translated. We propose a method that benefits from the semantic similarity in lexical chains to improve SMT output by integrating it in a document-level decoder. We focus on word embeddings to deal with the lexical chains, contrary to the traditional approach that uses lexical resources. Experimental results on German-to-English show that our method produces correct translations in up to 88% of the changes, improving the translation in 36%-48% of them over the baseline.

Keywords:
Computer science Natural language processing Machine translation Phrase Artificial intelligence Word (group theory) Context (archaeology) Evaluation of machine translation Translation (biology) Example-based machine translation Similarity (geometry) Focus (optics) German Transfer-based machine translation Machine translation software usability Rule-based machine translation Linguistics

Metrics

14
Cited By
2.52
FWCI (Field Weighted Citation Impact)
49
Refs
0.91
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Text Readability and Simplification
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.