Λαυρεντιάδου, Βασιλική Γεωργίου
The constant expansion and creation of online social platforms attracts people of different nationalities, religions, sexual orientations, and social backgrounds. Society’s lack of respect for and understanding of those differences perpetuates negativity, general dissatisfaction, and eventually hate speech. Social media companies are using automated hate speech detection systems to limit the appearance of such abusive content. These systems, however, rely heavily on the volume and quality of data. Despite the fact that a large amount of public datasets for hate speech detection is available, they are limited to popular languages such as English, French, and Spanish, among others. In our approach, we aim at accurately identifying hateful content on unpopular languages with the help of pre-trained transformer models. Both monolingual and multilingual models are deployed, for this task. We mostly seek to learn whether training multilingual models in linguistically related languages can provide enhanced results compared to language-specific pre-trained monolingual models. Additionally, we examine the contribution of multiple languages to the fine-tuning process. From all the experiments performed, it is observed that multilinguality does indeed improve performance in a few cases.
Sapthak Mohajon TurjyaRina KumariSujata SwainAnjan Bandyopadhyay