Koena Ronny MabokelaMpho PrimusTurgay Çelik
While sentiment analysis systems excel in high-resource languages, most African languages facing limited resources, remain under-represented. This gap leaves a significant portion of the world’s population without access to technologies in their native languages. However, multilingual pre-trained language models (PLM) offer a promising approach for sentiment analysis in low-resource languages. Although the absence of large data in African languages poses a challenge for developing PLMs, fine-tuning and task adaptation of existing multilingual PLMs is an alternative solution. This paper explores the use of multilingual PLMs for sentiment analysis in five Southern African languages: Sepedi , Sesotho , Setswana , isiXhosa , and isiZulu . We leverage existing PLMs and fine-tune them for this specific task, avoiding training the models from scratch. Our work expands on the SAfriSenti corpus, a Twitter sentiment dataset for these languages. We employ various annotation techniques to create a labelled dataset and perform benchmark experiments utilising various multilingual PLMs. Our findings demonstrate the effectiveness of multilingual PLM, particularly for closely-related languages (Sotho-Tswana), where the ensemble PLMs method achieved an average weighted F1 score above 63%. In particular, Nguni closely-related languages achieved an even higher average weighted F1 score, exceeding 77%, highlighting the potential of PLMs for sentiment analysis in South African languages.
Koena Ronny MabokelaMpho PrimusTurgay Çelik
Victor Kwaku AgbesiWenyu ChenSophyani Banaamwini YussifMd Altab HossinChiagoziem C. UkwuomaNoble Arden KuadeyCollinson Colin M. AgbesiNagwan Abdel SameeMona JamjoomMugahed A. Al–antari
S. BhowmikS. M. A. K. AzadPushmeet KohliHimanshi SainiRajesh KumarS. M. A. K. AzadAmit Singh Bisht