JOURNAL ARTICLE

P-Stemmer or NLTK Stemmer for Arabic Text Classification?

Abstract

Natural Language Processing (NLP) is a branch of computer science that focuses on developing systems that allow computers to communicate with people using everyday language. NLP tools are Devoted to making computers understand statements written in human language. Indexing, text retrieval and word processing are considered as challenges in the classification process. Hence, Arabic Natural Language Processing ANLP tools are needed to achieve the aforementioned tasks. ANLP includes preprocessing such as Stemming, Normalization, Stop-word Removal, Part of speech POS and other processes. In this work, we collected 1,000 news articles from Alghad.com newspaper, then we classified our dataset using SVM and NB algorithms using NLTK tool. We compared the results of two stemmers; P-Stemmer and NLTK stemmer using the mentioned classification process. The results of the classification for the P-Stemmer was better than the NLTK stemmer and for the two classifiers.

Keywords:
Computer science Natural language processing Artificial intelligence Preprocessor Search engine indexing Normalization (sociology) Arabic Support vector machine Word (group theory) Lemmatisation Text processing Word processing Speech recognition Linguistics

Metrics

6
Cited By
0.61
FWCI (Field Weighted Citation Impact)
33
Refs
0.76
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.