JOURNAL ARTICLE

Arabic Light Stemming: A Comparative Study between P-Stemmer, Khoja Stemmer, and Light10 Stemmer

Abstract

Arabic is a derived language that has a deep structure and words meaning, one of the Arabic challenges is its morphology dependency. Arabic Natural Language Processing (ANLP) tools are required to achieve many tasks, such as Machine learning. For the text classification task, the ANLP is considered as preprocessing steps. These preprocessing steps include but not limited to Stemming, Normalization, and Stop-words Removal. In this work, we collected 2,000 news articles from Arabic online newspapers, the data were classified using Support Vector Machine (SVM) and Nave Base (NB) classifiers. The classification task was conducted for the purpose of comparing three different Arabic light stemmers; P-Stemmer, Khoja Stemmer, and Light10 Stemmer. The P-Stemmer results was dominating the other two stemmers in both SVM and NB classifiers with accuracy of 0.92 for F1-measure in SVM classifier and 0.90 for F1-Measure in NB classifier.

Keywords:
Artificial intelligence Computer science Support vector machine Natural language processing Preprocessor Devanagari Classifier (UML) Arabic Pattern recognition (psychology) Speech recognition Linguistics Character recognition Image (mathematics)

Metrics

25
Cited By
1.54
FWCI (Field Weighted Citation Impact)
31
Refs
0.87
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Sentiment Analysis and Opinion Mining
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.