German is a highly inflectional language, where a large number of words can be generated from the same root.It makes a liberal use of compounding leading to high Out-of-vocabulary (OOV) rates, and poor Language Model (LM) probability estimates.Therefore, the use of morphemes for language modeling is considered a better choice for Large Vocabulary Continuous Speech Recognition (LVCSR) than the full-words.Thereby, better lexical coverage and less LM perplexities are achieved.On the other side, the use of Factored Language Models (FLMs) is considered a successful approach that allows the integration of many information sources to get better LM probability estimates.In this paper, we try a combined methodology for language modeling where both morphological decomposition and factored language modeling are used in one model called morpheme based FLM.Finally, we obtain around 2.5% relative reduction in Word Error Rate (WER) with respect to a traditional full-words system.
Amr El-Desoky MousaM. Ali Basha ShaikRalf SchlüterHermann Ney
Ghinwa F. ChoueiterDaniel PoveyS.F. ChenGeoffrey Zweig
Daria VazheninaKonstantin Markov
Amr El-Desoky MousaM. Ali Basha ShaikRalf SchlüterHermann Ney