JOURNAL ARTICLE

Towards improving Khoja rule-based Arabic stemmer

Abstract

Stemming algorithms are used to remove irrelevant morphological variations from different words, and extract the stem or the root from which the inputted word is derived. Stemming can then help to standardize terms referring to the same concept. These algorithms are widely used in information retrieval systems and Web search engines, in addition to other systems such as: Machine translation, text clustering, text summarization, question answering, indexing, text mining, text classification… etc. Khoja stemmer is a standard Arabic stemmer, which has a number of flaws. Previous studies and this one show that Khoja stemmer is better than other two competitive ones evaluated in this study. The Khoja stemmer and the other two evaluated Arabic stemmers depend mainly in their work on (Patterns, Forms, "***"). Therefore the identification of the flaws leads to identification of missing Patterns not used by Khoja stemmer. So the enhancement to Khoja stemmer is restricted to adding missing patterns, and this leads to around 5% improvement to the accuracy of Khoja stemmer.

Keywords:
Arabic Computer science Natural language processing Artificial intelligence Linguistics Philosophy

Metrics

21
Cited By
3.30
FWCI (Field Weighted Citation Impact)
12
Refs
0.93
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and dialogue systems
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.