Leah S. LarkeyLisa BallesterosMargaret E. Connell
Arabic, a highly inflected language, requires good stemming for effective information retrieval, yet no standard approach to stemming has emerged. We developed several light stemmers based on heuristics and a statistical stemmer based on co-occurrence for Arabic retrieval. We compared the retrieval effectiveness of our stemmers and of a morphological analyzer on the TREC-2001 data. The best light stemmer was more effective for cross-language retrieval than a morphological stemmer which tried to find the root for each word. A repartitioning process consisting of vowel removal followed by clustering using co-occurrence analysis produced stem classes which were better than no stemming or very light stemming, but still inferior to good light stemming or morphological analysis.
Leah S. LarkeyLisa BallesterosMargaret E. Connell
Leah S. LarkeyLisa BallesterosMargaret E. Connell
Imad ZeroualAbdelhak Lakhouaja
Arjun GogoiNomi BaruahSikhar Kr. SarmaRakhee D. Phukan
Jaffar AtwanMasnizah MohdGhassan Kanaan