JOURNAL ARTICLE

Arabic light-based stemming: a comparative study among ligh10 stemmer, P-stemmer, and Conditional light stemmer

Abstract

Arabic stemming is a key stage in natural language processing's preprocessing (NLP). It takes affixes out of words. It improves text classification (TC) as well as information retrieval (IR). Light-based stemming and root-based stemming are the two types of stem. When compared to root-based stemming, light-based stemming consumes more energy. Only suffixes and prefixes are removed from the words. The light10 stemmer, the p-stemmer, and conditional light stemming (CondLight) are three well-known methods of light stemming. Prefixes and suffixes are removed by Light10 stemmers under a few conditions. Only prefixes are removed by the P-stemmer, while the CondLight stemmer is the same as the Light10 stemmer but with eight conditions. We measured the extent of improvement in Arabic TC by evaluating the stemmers. Three classifiers employ the Support Vector Machine (SVM), the k-nearest neighbor algorithm (KNN), Nave Bays (NB), and statistical similarity measurement. With stemming, the outcome indicates a small improvement (about 2 percent improvement).

Keywords:
Artificial intelligence Computer science Prefix Natural language processing Preprocessor Root (linguistics) Support vector machine Pattern recognition (psychology) Suffix Speech recognition

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
12
Refs
0.23
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.