JOURNAL ARTICLE

Clustering Arabic Tweets for Sentiment Analysis

Abstract

The focus of this study is to evaluate the impact of linguistic preprocessing and similarity functions for clustering Arabic Twitter tweets. The experiments apply an optimized version of the standard K-Means algorithm to assign tweets into positive and negative categories. The results show that root-based stemming has a significant advantage over light stemming in all settings. The Averaged Kullback-Leibler Divergence similarity function clearly outperforms the Cosine, Pearson Correlation, Jaccard Coefficient and Euclidean functions. The combination of the Averaged Kullback-Leibler Divergence and root-based stemming achieved the highest purity of 0.764 while the second-best purity was 0.719. These results are of importance as it is contrary to normal-sized documents where, in many information retrieval applications, light stemming performs better than root-based stemming and the Cosine function is commonly used.

Keywords:
Jaccard index Cluster analysis Divergence (linguistics) Computer science Cosine similarity Preprocessor Euclidean distance Similarity (geometry) Artificial intelligence Pattern recognition (psychology) Pearson product-moment correlation coefficient Focus (optics) Arabic Sentiment analysis Function (biology) Natural language processing Mathematics Statistics Linguistics Image (mathematics)

Metrics

17
Cited By
1.83
FWCI (Field Weighted Citation Impact)
32
Refs
0.88
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Sentiment Analysis and Opinion Mining
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

BOOK-CHAPTER

Twitter Sentiment Analysis for Arabic Tweets

Sherihan AbueleninSamir ElmougyEman Naguib

Advances in intelligent systems and computing Year: 2017 Pages: 467-476
BOOK-CHAPTER

Sentiment Analysis of Arabic and English Tweets

Mohamed K. ElhadadKin Fun LiFayez Gebali

Advances in intelligent systems and computing Year: 2019 Pages: 334-348
BOOK-CHAPTER

Improving Sentiment Analysis of Arabic Tweets

Abdulrahman AlrubanMuhammed AbduallahGueltoum BendiabStavros ShiaelesMarco A. Palomino

Communications in computer and information science Year: 2020 Pages: 146-158
BOOK-CHAPTER

Sentiment Analysis of Arabic COVID-19 Tweets

Dena AhmedSaid A. SalloumKhaled Shaalan

Lecture notes in networks and systems Year: 2021 Pages: 623-632
© 2026 ScienceGate Book Chapters — All rights reserved.