Sub-lexical language models for German LVCSR

Amr El-Desoky Mousa; M. Ali Basha Shaik; Ralf Schlüter; Hermann Ney

doi:10.1109/slt.2010.5700846

ScienceGate Book Chapters

JOURNAL ARTICLE

Sub-lexical language models for German LVCSR

Amr El-Desoky Mousa M. Ali Basha Shaik Ralf Schlüter Hermann Ney

Year: 2010 Pages: 171-176

DOI: 10.1109/slt.2010.5700846

Get Full-Text PDF Get Analytical Report

Abstract

One of the major difficulties related to German LVCSR is the rich morphology nature of German, leading to high out-of-vocabulary (OOV) rates, and high language model (LM) perplexities. Normally, compound words make up an essential fraction of the German vocabulary. Most compound OOVs are composed of frequent in-vocabulary words. Here, we investigate the use of sub-lexical LMs based on different approaches for word decomposition, namely supervised and unsupervised decomposition, as well as decomposition derived from grapheme-to-phoneme (G2P) conversion. In the later approach, we augment a normal word model with a set of grapheme-phoneme pairs called graphones used to model the OOV words. A novel approach is proposed to select the representative graphone sequences for OOVs based on unsupervised decomposition and word-pronunciation alignment. We obtain relative reductions in word error rate (WER) from 4.2% to 6.5% with respect to a comparable full-words system.

Keywords:

Computer science Pronunciation German Artificial intelligence Natural language processing Vocabulary Grapheme Word (group theory) Word error rate Speech recognition Language model Set (abstract data type) Decomposition Compound Linguistics

Metrics

Cited By

4.81

FWCI (Field Weighted Citation Impact)

Refs

0.95

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Sub-lexical language models for German LVCSR

Abstract

Metrics

Citation History

Topics

Related Documents

Sub-word Language Models for German LVCSR

Feature-rich sub-lexical language models using a maximum entropy approach for German LVCSR

Hybrid language models using mixed types of sub-lexical units for open vocabulary German LVCSR

Morpheme based factored language models for German LVCSR

Morpheme level feature-based language models for German LVCSR