JOURNAL ARTICLE

Morpheme-Based Language Modeling for Arabic Lvcsr

Abstract

In this paper, we concentrate on Arabic speech recognition. Taking advantage of the rich morphological structure of the language, we use morpheme-based language modeling to improve the word error rate. We propose a simple constraining method to rid the decoding output of illegal morpheme sequences. We report the results obtained for word and morpheme language models using medium (64 kw) and large (~800 kw) vocabularies, the morpheme LM obtaining an absolute improvement of 2.4% for the former and only 0.2% for the latter. The 2.4% gain surpasses previous gains for morpheme-based LMs for Arabic, and the large vocabulary runs represent the first comparative results for vocabularies of this size for any language. Finally, we analyze the performance of the morpheme LM on word OOV's

Keywords:
Morpheme Computer science Word error rate Vocabulary Word (group theory) Speech recognition Natural language processing Language model Artificial intelligence Arabic Focus (optics) Linguistics

Metrics

44
Cited By
5.89
FWCI (Field Weighted Citation Impact)
16
Refs
0.96
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.