Improved statistical machine translation using paraphrases

Chris Callison-Burch; Philipp Koehn; Miles Osborne

doi:10.3115/1220835.1220838

ScienceGate Book Chapters

JOURNAL ARTICLE

Improved statistical machine translation using paraphrases

Chris Callison-Burch Philipp Koehn Miles Osborne

Year: 2006 Pages: 17-24

DOI: 10.3115/1220835.1220838

Get Full-Text PDF Get Analytical Report

Abstract

Parallel corpora are crucial for training SMT systems. However, for many language pairs they are available only in very limited quantities. For these language pairs a huge portion of phrases encountered at run-time will be unknown. We show how techniques from paraphrasing can be used to deal with these otherwise unknown source language phrases. Our results show that augmenting a state-of-the-art SMT system with paraphrases leads to significantly improved coverage and translation quality. For a training corpus with 10,000 sentence pairs we increase the coverage of unique test set unigrams from 48% to 90%, with more than half of the newly covered items accurately translated, as opposed to none in current approaches.

Keywords:

Computer science Machine translation Natural language processing Artificial intelligence Sentence Set (abstract data type) Translation (biology) Quality (philosophy) Test set Language model State (computer science) Example-based machine translation Training set Speech recognition Programming language

Metrics

293

Cited By

20.83

FWCI (Field Weighted Citation Impact)

Refs

1.00

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Text Readability and Simplification

Physical Sciences → Computer Science → Artificial Intelligence

Improved statistical machine translation using paraphrases

Abstract

Metrics

Citation History

Topics

Related Documents

Improved Statistical Machine Translation Using Monolingual Paraphrases

Improved statistical machine translation using monolingually-derived paraphrases

Using paraphrases for parameter tuning in statistical machine translation

Using Translation Paraphrases from Trilingual Corpora to Improve Phrase-Based Statistical Machine Translation: A Preliminary Report

Translation Paraphrases in Phrase-Based Machine Translation