Bilingual Segmenter for Statistical Machine Translation

Chung‐Chi Huang; Wei-Teh Chen; Jason S. Chang

doi:10.1109/isuc.2008.10

ScienceGate Book Chapters

JOURNAL ARTICLE

Bilingual Segmenter for Statistical Machine Translation

Chung‐Chi Huang Wei-Teh Chen Jason S. Chang

Year: 2008 Vol: 23 Pages: 97-104

DOI: 10.1109/isuc.2008.10

Get Full-Text PDF Get Analytical Report

Abstract

We propose a bilingually-motivated segmenting framework for Chinese which has no clear delimiter for word boundaries. It involves producing Chinese tokens in line with word-based languages¿ words using a bilingual segmenting algorithm, provided with bitexts, and deriving a probabilistic tokenizing model based on previously annotated Chinese sentences. In the bilingual segmenting algorithm, we first convert the search for segmentation into a sequential tagging problem, allowing for a polynomial-time dynamic programming solution, and incorporate a control to balance mono- and bi-lingual information in tailoring Chinese sentences. Experiments show that our framework, applied as a pre-tokenization component, significantly outperforms existing segmenters in translation quality, suggesting our methodology supports better segmentation for bilingual NLP applications involving isolated languages such as Chinese.

Keywords:

Computer science Lexical analysis Artificial intelligence Natural language processing Market segmentation Probabilistic logic Segmentation Machine translation Translation (biology) Word (group theory) Mathematics

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.11

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Biomedical Text Mining and Ontologies

Life Sciences → Biochemistry, Genetics and Molecular Biology → Molecular Biology

Bilingual Segmenter for Statistical Machine Translation

Abstract

Metrics

Topics

Related Documents

Bilingual Sentiment Consistency for Statistical Machine Translation

Bilingual chunk alignment in statistical machine translation

Bilingual word spectral clustering for statistical machine translation

Using noisy bilingual data for statistical machine translation

Bilingual LSA-based adaptation for statistical machine translation