JOURNAL ARTICLE

Cross-lingual latent semantic analysis for language modeling

Abstract

Statistical language model estimation requires large amounts of domain-specific text, which is difficult to obtain in many languages. We propose techniques which exploit domain-specific text in a resource-rich language to adapt a language model in a resource-deficient language. A primary advantage of our technique is that in the process of cross-lingual language model adaptation, we do not rely on the availability of any machine translation capability. Instead, we assume that only a modest-sized collection of story-aligned document-pairs in the two languages is available. We use ideas from cross-lingual latent semantic analysis to develop a single low-dimensional representation shared by words and documents in both languages, which enables us to (i) find documents in the resource-rich language pertaining to a specific story in the resource-deficient language, and (ii) extract statistics from the pertinent documents to adapt a language model to the story of interest. We demonstrate significant reductions in perplexity and error rates in a Mandarin speech recognition task using this technique.

Keywords:
Computer science Perplexity Natural language processing Language model Artificial intelligence Machine translation Cache language model Latent semantic analysis Language identification Resource (disambiguation) Process (computing) Domain (mathematical analysis) Universal Networking Language Natural language Programming language Comprehension approach

Metrics

27
Cited By
2.70
FWCI (Field Weighted Citation Impact)
12
Refs
0.91
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Cross-lingual latent semantic analysis

William CoxBrandon Pincombe

Journal:   ANZIAM Journal Year: 2008 Vol: 48 Pages: 1054-1054
JOURNAL ARTICLE

Lexical triggers and latent semantic analysis for cross-lingual language model adaptation

Woosung KimSanjeev Khudanpur

Journal:   ACM Transactions on Asian Language Information Processing Year: 2004 Vol: 3 (2)Pages: 94-112
© 2026 ScienceGate Book Chapters — All rights reserved.