Advancing sentiment analysis for low-resourced african languages using pre-trained language models

Koena Ronny Mabokela; Mpho Primus; Turgay Çelik

doi:10.1371/journal.pone.0325102

ScienceGate Book Chapters

JOURNAL ARTICLE

Advancing sentiment analysis for low-resourced african languages using pre-trained language models

Koena Ronny Mabokela Mpho Primus Turgay Çelik

Year: 2025 Journal: PLoS ONE Vol: 20 (6)Pages: e0325102-e0325102 Publisher: Public Library of Science

DOI: 10.1371/journal.pone.0325102

Get Full-Text PDF Get Analytical Report

Abstract

While sentiment analysis systems excel in high-resource languages, most African languages facing limited resources, remain under-represented. This gap leaves a significant portion of the world’s population without access to technologies in their native languages. However, multilingual pre-trained language models (PLM) offer a promising approach for sentiment analysis in low-resource languages. Although the absence of large data in African languages poses a challenge for developing PLMs, fine-tuning and task adaptation of existing multilingual PLMs is an alternative solution. This paper explores the use of multilingual PLMs for sentiment analysis in five Southern African languages: Sepedi , Sesotho , Setswana , isiXhosa , and isiZulu . We leverage existing PLMs and fine-tune them for this specific task, avoiding training the models from scratch. Our work expands on the SAfriSenti corpus, a Twitter sentiment dataset for these languages. We employ various annotation techniques to create a labelled dataset and perform benchmark experiments utilising various multilingual PLMs. Our findings demonstrate the effectiveness of multilingual PLM, particularly for closely-related languages (Sotho-Tswana), where the ensemble PLMs method achieved an average weighted F1 score above 63%. In particular, Nguni closely-related languages achieved an even higher average weighted F1 score, exceeding 77%, highlighting the potential of PLMs for sentiment analysis in South African languages.

Keywords:

Computer science Leverage (statistics) Languages of Africa Natural language processing Sentiment analysis Artificial intelligence Lemmatisation Annotation Task (project management) Population Benchmark (surveying) Linguistics Geography

Metrics

Cited By

19.28

FWCI (Field Weighted Citation Impact)

Refs

0.99

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Sentiment Analysis and Opinion Mining

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Advancing sentiment analysis for low-resourced african languages using pre-trained language models

Abstract

Metrics

Citation History

Topics

Related Documents

Explainable Pre-Trained Language Models for Sentiment Analysis in Low-Resourced Languages

Enhancing Turkish Sentiment Analysis Using Pre-Trained Language Models

Pre-Trained Transformer-Based Models for Text Classification Using Low-Resourced Ewe Language

Sentiment Analysis in Sundanese Using Pre-trained Multilingual Language Models

Leveraging Generative Pre-trained Models and Discriminative Pre-trained Language Models for Sentiment Analysis