Pre-Trained Transformer-Based Models for Text Classification Using Low-Resourced Ewe Language

Victor Kwaku Agbesi; Wenyu Chen; Sophyani Banaamwini Yussif; Md Altab Hossin; Chiagoziem C. Ukwuoma; Noble Arden Kuadey; Collinson Colin M. Agbesi; Nagwan Abdel Samee; Mona Jamjoom; Mugahed A. Al–antari

doi:10.3390/systems12010001

JOURNAL ARTICLE

Pre-Trained Transformer-Based Models for Text Classification Using Low-Resourced Ewe Language

Victor Kwaku Agbesi Wenyu Chen Sophyani Banaamwini Yussif Md Altab Hossin Chiagoziem C. Ukwuoma Noble Arden Kuadey Collinson Colin M. Agbesi Nagwan Abdel Samee Mona Jamjoom Mugahed A. Al–antari

Year: 2023 Journal: Systems Vol: 12 (1)Pages: 1-1 Publisher: Multidisciplinary Digital Publishing Institute

DOI: 10.3390/systems12010001

Get Full-Text PDF Get Analytical Report

Abstract

Despite a few attempts to automatically crawl Ewe text from online news portals and magazines, the African Ewe language remains underdeveloped despite its rich morphology and complex "unique" structure. This is due to the poor quality, unbalanced, and religious-based nature of the crawled Ewe texts, thus making it challenging to preprocess and perform any NLP task with current transformer-based language models. In this study, we present a well-preprocessed Ewe dataset for low-resource text classification to the research community. Additionally, we have developed an Ewe-based word embedding to leverage the low-resource semantic representation. Finally, we have fine-tuned seven transformer-based models, namely BERT-based (cased and uncased), DistilBERT-based (cased and uncased), RoBERTa, DistilRoBERTa, and DeBERTa, using the preprocessed Ewe dataset that we have proposed. Extensive experiments indicate that the fine-tuned BERT-base-cased model outperforms all baseline models with an accuracy of 0.972, precision of 0.969, recall of 0.970, loss score of 0.021, and an F1-score of 0.970. This performance demonstrates the model’s ability to comprehend the low-resourced Ewe semantic representation compared to all other models, thus setting the fine-tuned BERT-based model as the benchmark for the proposed Ewe dataset.

Keywords:

Transformer Leverage (statistics) Computer science Language model Artificial intelligence Natural language processing Word embedding F1 score Embedding Benchmark (surveying) Geography Engineering Cartography

Metrics

Cited By

2.30

FWCI (Field Weighted Citation Impact)

Refs

0.88

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Text and Document Classification Technologies

Physical Sciences → Computer Science → Artificial Intelligence

Religion and Sociopolitical Dynamics in Nigeria

Social Sciences → Social Sciences → General Social Sciences

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Pre-Trained Transformer-Based Models for Text Classification Using Low-Resourced Ewe Language

Abstract

Metrics

Citation History

Topics

Related Documents

Multilingual Text Summarization in Healthcare Using Pre-Trained Transformer-Based Language Models

Advancing sentiment analysis for low-resourced african languages using pre-trained language models

Text Classification for Marketing Research Using Pre-Trained General Language Models

Pre-trained transformer-based language models for Sundanese

Explainable Pre-Trained Language Models for Sentiment Analysis in Low-Resourced Languages