JOURNAL ARTICLE

Improving stemming for Arabic information retrieval

Leah S. LarkeyLisa BallesterosMargaret E. Connell

Year: 2002 Journal:   Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02

Abstract

Arabic, a highly inflected language, requires good stemming for effective information retrieval, yet no standard approach to stemming has emerged. We developed several light stemmers based on heuristics and a statistical stemmer based on co-occurrence for Arabic retrieval. We compared the retrieval effectiveness of our stemmers and of a morphological analyzer on the TREC-2001 data. The best light stemmer was more effective for cross-language retrieval than a morphological stemmer which tried to find the root for each word. A repartitioning process consisting of vowel removal followed by clustering using co-occurrence analysis produced stem classes which were better than no stemming or very light stemming, but still inferior to good light stemming or morphological analysis.

Keywords:
Arabic Computer science Information retrieval Natural language processing Artificial intelligence Linguistics

Metrics

16
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.62
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Information Retrieval and Search Behavior
Physical Sciences →  Computer Science →  Information Systems
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

BOOK-CHAPTER

Light Stemming for Arabic Information Retrieval

Leah S. LarkeyLisa BallesterosMargaret E. Connell

Text, speech and language technology Year: 2007 Pages: 221-243
JOURNAL ARTICLE

Improving stemming for Assamese information retrieval

Arjun GogoiNomi BaruahSikhar Kr. SarmaRakhee D. Phukan

Journal:   International Journal of Information Technology Year: 2021 Vol: 13 (5)Pages: 1763-1768
BOOK-CHAPTER

Enhanced Arabic Information Retrieval: Light Stemming and Stop Words

Jaffar AtwanMasnizah MohdGhassan Kanaan

Communications in computer and information science Year: 2013 Pages: 219-228
© 2026 ScienceGate Book Chapters — All rights reserved.