JOURNAL ARTICLE

A virtuous circle: laundering translation memory data using statistical machine translation

Abstract

AbstractThis study compares consistency in target texts produced using translation memory (TM) with that of target texts produced using statistical machine translation (SMT), where the SMT engine is trained on the same texts as are reused in the TM workflow. These comparisons focus specifically on noun and verb inconsistencies, as such inconsistencies appear to be highly prevalent in TM data. The study substitutes inconsistent TM target text nouns and verbs for consistent nouns and verbs from the SMT output to test whether this results in improvements in overall TM consistency and whether an SMT engine trained on the ‘laundered’ TM data performs better than the baseline engine. Improvements were observed in both TM consistency and SMT performance, a finding that indicates the potential of this approach for improving TM/MT integration.Keywords: translation memorystatistical machine translationlocalisationtranslation qualitytranslation consistency AcknowledgementsThis research is supported by the Science Foundation Ireland (Grant 07/CE/I1142) as part of the Centre for Next Generation Localisation (www.cngl.ie) at Dublin City University. Thanks to Prof Andy Way and Dr Jie Jieng for the use of the SmartMATE system.Notes on contributorsJoss Moorkens, BA, PhD, is a post-doctoral researcher at the Centre for Next Generation Localisation and lecturer in Japanese at the School of Applied Language and Intercultural Studies at Dublin City University. He has authored several journal articles and a book chapter on translation memory, consistency, and translation technology standards. He is currently a member of the Centre for Translation and Textual Studies and is co-editing a forthcoming issue of the journal New Voices in Translation Studies.Stephen Doherty, BA, HDip, PhD, MBPsS, is a post-doctoral researcher at the Centre for Next Generation Localisation, School of Computing, in Dublin City University. He conducts research on language and cognition and translation technologies. He is currently working on a collaborative European research initiative dedicated to overcoming barriers in translation and language technologies. He has several peer-reviewed and conference publications, and has acted as an editor for several publications, including New Voices in Translation Studies. He also serves on the programme/organising committee for several conferences and workshops, and is on the editorial board for several journals, including the European Psychologist Journal.Dorothy Kenny, BA, MSc, PhD, is a Senior Lecturer in translation studies and corpus linguistics at Dublin City University. She is a member of the editorial boards of The Translator and Interpreter Trainer and New Voices in Translation Studies. Her publications include: Lexis and creativity in translation: A corpus-based study (St. Jerome, 2001), and several co-edited volumes. She has authored numerous articles and chapters on corpus-based translation studies, computer-aided translation, translator training, and translation theory. She is a Board Member of the European Masters in Translation (EMT) and an executive member of the International Association for Translation and Intercultural Studies.Sharon O'Brien, BA, MA, PhD, is a lecturer in translation studies at Dublin City University and primarily conducts research on translation technology with a focus on controlled language, machine translation, post-editing, and localisation. She teaches software localisation, translation theory, translation technology and research methods for translation studies. She has co-authored a book on research methods for translation studies (St. Jerome, forthcoming) and edited a volume on cognitive explorations of translation (Continuum, 2011). She has also collaborated on numerous projects within the industry, specifically on the topics of machine translation, post-editing and the dynamic framework for quality assessment in the localisation industry.Notes1. www.apsic.com/en/products_xbench.html2. www.smartmate.co3. ‘Selecting’ was translated in the baseline TM data variously as エレメントの選択 or ‘selection of elements’; コールアウトエレメントの選択 or ‘selection of callout elements’; アセンブリの選択 ‘selection of assembly’; 多角形の選択, ‘selection of polygon’; 線の選択, selection of a line; 楕円の選択, ‘selection of ellipse’; 選択, ‘selection’;長方形の選択, ‘selection of rectangle’; ベジエ曲線の選択, ‘selection of Bezier curve’;めねじの選択, ‘selection of female screw’; おねじの選択, ‘selection of male screw’.4. ‘Complete’ here means that a target translation is completely inconsistent with a previous translation for the same source segment, as shown in the definition presented earlier.

Keywords:
Consistency (knowledge bases) Machine translation Computer science Verb Noun Natural language processing Workflow Artificial intelligence Focus (optics) Database

Metrics

30
Cited By
0.94
FWCI (Field Weighted Citation Impact)
16
Refs
0.85
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Using noisy bilingual data for statistical machine translation

Stephan Vogel

Year: 2003 Vol: 2 Pages: 175-175
BOOK-CHAPTER

Machine Translation Using Statistical Modeling

Year: 2003 Pages: 310-339
BOOK-CHAPTER

Machine Translation Using Statistical Modeling

Herman NeyFranz Josef Och

Electrical engineering and applied signal processing series/˜The œelectrical engineering and applied signal processing series Year: 2003
© 2026 ScienceGate Book Chapters — All rights reserved.