Cross-lingual Offensive Language Identification for Low Resource Languages: The Case of Marathi

Saurabh Gaikwad; Tharindu Ranasinghe; Marcos Zampieri; Christopher M. Homan

doi:10.26615/978-954-452-072-4_050

ScienceGate Book Chapters

JOURNAL ARTICLE

Cross-lingual Offensive Language Identification for Low Resource Languages: The Case of Marathi

Saurabh Gaikwad Tharindu Ranasinghe Marcos Zampieri Christopher M. Homan

Year: 2021 Pages: 437-443

DOI: 10.26615/978-954-452-072-4_050

Get Full-Text PDF Get Analytical Report

Abstract

The widespread presence of offensive language on social media motivated the development of systems capable of recognizing such content automatically.Apart from a few notable exceptions, most research on automatic offensive language identification has dealt with English.To address this shortcoming, we introduce MOLD 1 , the Marathi Offensive Language Dataset.MOLD is the first dataset of its kind compiled for Marathi, thus opening a new domain for research in lowresource Indo-Aryan languages.We present results from several machine learning experiments on this dataset, including zero-short and other transfer learning experiments on state-ofthe-art cross-lingual transformers from existing data in Bengali, English, and Hindi.

Keywords:

Marathi Offensive Computer science Bengali Hindi Language identification Natural language processing Artificial intelligence Identification (biology) Natural language Linguistics Engineering

Metrics

Cited By

4.94

FWCI (Field Weighted Citation Impact)

Refs

0.96

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Hate Speech and Cyberbullying Detection

Physical Sciences → Computer Science → Artificial Intelligence

Swearing, Euphemism, Multilingualism

Social Sciences → Social Sciences → Communication

Cross-lingual Offensive Language Identification for Low Resource Languages: The Case of Marathi

Abstract

Metrics

Citation History

Topics

Related Documents

Cross-lingual offensive speech identification with transfer learning for low-resource languages

Investigating cross-lingual training for offensive language detection

Hindi-Marathi Cross Lingual Model

Offensive language detection in low resource languages: A use case of Persian language

An Analysis of Cross-Lingual Natural Language Processing for Low-Resource Languages