JOURNAL ARTICLE

Advancing sentiment analysis for low-resourced african languages using pre-trained language models

Koena Ronny MabokelaMpho PrimusTurgay Çelik

Year: 2025 Journal:   PLoS ONE Vol: 20 (6)Pages: e0325102-e0325102   Publisher: Public Library of Science

Abstract

While sentiment analysis systems excel in high-resource languages, most African languages facing limited resources, remain under-represented. This gap leaves a significant portion of the world’s population without access to technologies in their native languages. However, multilingual pre-trained language models (PLM) offer a promising approach for sentiment analysis in low-resource languages. Although the absence of large data in African languages poses a challenge for developing PLMs, fine-tuning and task adaptation of existing multilingual PLMs is an alternative solution. This paper explores the use of multilingual PLMs for sentiment analysis in five Southern African languages: Sepedi , Sesotho , Setswana , isiXhosa , and isiZulu . We leverage existing PLMs and fine-tune them for this specific task, avoiding training the models from scratch. Our work expands on the SAfriSenti corpus, a Twitter sentiment dataset for these languages. We employ various annotation techniques to create a labelled dataset and perform benchmark experiments utilising various multilingual PLMs. Our findings demonstrate the effectiveness of multilingual PLM, particularly for closely-related languages (Sotho-Tswana), where the ensemble PLMs method achieved an average weighted F1 score above 63%. In particular, Nguni closely-related languages achieved an even higher average weighted F1 score, exceeding 77%, highlighting the potential of PLMs for sentiment analysis in South African languages.

Keywords:
Computer science Leverage (statistics) Languages of Africa Natural language processing Sentiment analysis Artificial intelligence Lemmatisation Annotation Task (project management) Population Benchmark (surveying) Linguistics Geography

Metrics

4
Cited By
19.28
FWCI (Field Weighted Citation Impact)
78
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Sentiment Analysis and Opinion Mining
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.