JOURNAL ARTICLE

Morpheme based factored language models for German LVCSR

Abstract

German is a highly inflectional language, where a large number of words can be generated from the same root.It makes a liberal use of compounding leading to high Out-of-vocabulary (OOV) rates, and poor Language Model (LM) probability estimates.Therefore, the use of morphemes for language modeling is considered a better choice for Large Vocabulary Continuous Speech Recognition (LVCSR) than the full-words.Thereby, better lexical coverage and less LM perplexities are achieved.On the other side, the use of Factored Language Models (FLMs) is considered a successful approach that allows the integration of many information sources to get better LM probability estimates.In this paper, we try a combined methodology for language modeling where both morphological decomposition and factored language modeling are used in one model called morpheme based FLM.Finally, we obtain around 2.5% relative reduction in Word Error Rate (WER) with respect to a traditional full-words system.

Keywords:
Morpheme Computer science Language model Vocabulary Word error rate Natural language processing Artificial intelligence German Speech recognition Word (group theory) Cache language model Linguistics Natural language Universal Networking Language

Metrics

12
Cited By
2.74
FWCI (Field Weighted Citation Impact)
22
Refs
0.92
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.