JOURNAL ARTICLE

Data selection for statistical machine translation

Abstract

The bilingual language corpus has a great effect on the performance of a statistical machine translation system. More data will lead to better performance. However, more data also increase the computational load. In this paper, we propose methods to estimate the sentence weight and select more informative sentences from the training corpus and the development corpus based on the sentence weight. The translation system is built and tuned on the compact corpus. The experimental results show that we can obtain a competitive performance with much less data.

Keywords:
Machine translation Computer science Sentence Artificial intelligence Translation (biology) Natural language processing Selection (genetic algorithm) Example-based machine translation Machine translation software usability

Metrics

4
Cited By
1.20
FWCI (Field Weighted Citation Impact)
15
Refs
0.85
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Semantic Web and Ontologies
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Data analysis and selection for statistical machine translation

Sauleh Eetemadi

Journal:   Michigan State University Libraries Year: 2024
JOURNAL ARTICLE

Survey of data-selection methods in statistical machine translation

Sauleh EetemadiWilliam D. LewisKristina ToutanovaHayder Radha

Journal:   Machine Translation Year: 2015 Vol: 29 (3-4)Pages: 189-223
JOURNAL ARTICLE

Neural Networks Classifier for Data Selection in Statistical Machine Translation

Álvaro PerisMara Chinea-RíosFrancisco Casacuberta

Journal:   ˜The œPrague Bulletin of Mathematical Linguistics Year: 2017 Vol: 108 (1)Pages: 283-294
JOURNAL ARTICLE

Bilingual recursive neural network based data selection for statistical machine translation

Derek F. WongYi LüLidia S. Chao

Journal:   Knowledge-Based Systems Year: 2016 Vol: 108 Pages: 15-24
© 2026 ScienceGate Book Chapters — All rights reserved.