JOURNAL ARTICLE

Lexical triggers and latent semantic analysis for cross-lingual language model adaptation

Woosung KimSanjeev Khudanpur

Year: 2004 Journal:   ACM Transactions on Asian Language Information Processing Vol: 3 (2)Pages: 94-112   Publisher: Association for Computing Machinery

Abstract

In-domain texts for estimating statistical language models are not easily found for most languages of the world. We present two techniques to take advantage of in-domain text resources in other languages. First, we extend the notion of <i>lexical triggers</i>, which have been used monolingually for language model adaptation, to the cross-lingual problem, permitting the construction of sharper language models for a target-language document by drawing statistics from related documents in a resource-rich language. Next, we show that <i>cross-lingual latent semantic analysis</i> is similarly capable of extracting useful statistics for language modeling. Neither technique requires explicit translation capabilities between the two languages! We demonstrate significant reductions in both perplexity and word error rate on a Mandarin speech recognition task by using these techniques.

Keywords:
Perplexity Computer science Natural language processing Artificial intelligence Language model Latent semantic analysis Mandarin Chinese Machine translation Word (group theory) Cache language model Adaptation (eye) Linguistics Natural language Universal Networking Language Comprehension approach

Metrics

33
Cited By
3.09
FWCI (Field Weighted Citation Impact)
16
Refs
0.92
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.