JOURNAL ARTICLE

Arabic information retrieval: Stemming or lemmatization?

Abstract

The Arabic language is expanding in the world. According to UNESCO, the Arabic language is spoken by more than 422 million native speakers around 29 countries and among 1.6 billion Muslims worldwide use it to perform their daily prayers. The presence of the Arabic language on the internet grew around 6.091% in the last fifteen years (2000-2015), it is the highest growth of the ten top online languages. Therefore, the number of Arabic documents increases rapidly. This calls for the necessity to improve Arabic Information Retrieval (IR) techniques. Many researchers agree on the benefits of both stemming and lemmatization in IR, primarily with highly inflective languages, short documents and limited space for storing data. The chief purpose of the current study is assessing the impact of stemming and lemmatization on Arabic IR. In this paper, we illustrate several concepts of Arabic morphology, including stemming and lemmatization algorithms. Then, we highlight the use of these latter and their benefits for different Arabic IR systems. Finally, an experiment is conducted to calculate the occurrence of all Quranic surface word, stem, and lemma forms by searching their similarities in both Classical and Modern Standard Arabic resources. In doing so, recent and efficient analyzers AlKhalil Morpho Sys and MADAMIRA are used.

Keywords:
Lemmatisation Computer science Arabic Lemma (botany) Natural language processing Artificial intelligence Semitic languages The Internet Information retrieval Linguistics World Wide Web

Metrics

17
Cited By
0.92
FWCI (Field Weighted Citation Impact)
72
Refs
0.80
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Information Retrieval and Search Behavior
Physical Sciences →  Computer Science →  Information Systems

Related Documents

BOOK-CHAPTER

Light Stemming for Arabic Information Retrieval

Leah S. LarkeyLisa BallesterosMargaret E. Connell

Text, speech and language technology Year: 2007 Pages: 221-243
JOURNAL ARTICLE

Improving stemming for Arabic information retrieval

Leah S. LarkeyLisa BallesterosMargaret E. Connell

Journal:   Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02 Year: 2002
BOOK-CHAPTER

Stemming and Lemmatization for Information Retrieval Systems in Amazigh Language

Samir AmriLahbib Zenkouar

Communications in computer and information science Year: 2018 Pages: 222-233
JOURNAL ARTICLE

Stemming and Lemmatization: A Comparison of Retrieval Performances

Vimala BalakrishnanLloyd-Yemoh Ethel

Journal:   Lecture Notes on Software Engineering Year: 2014 Vol: 2 (3)Pages: 262-267
© 2026 ScienceGate Book Chapters — All rights reserved.