The widespread presence of offensive language on social media motivated the development of systems capable of recognizing such content automatically.Apart from a few notable exceptions, most research on automatic offensive language identification has dealt with English.To address this shortcoming, we introduce MOLD 1 , the Marathi Offensive Language Dataset.MOLD is the first dataset of its kind compiled for Marathi, thus opening a new domain for research in lowresource Indo-Aryan languages.We present results from several machine learning experiments on this dataset, including zero-short and other transfer learning experiments on state-ofthe-art cross-lingual transformers from existing data in Bengali, English, and Hindi.
Xiayang ShiXinyi LiuXu ChunYuanyuan HuangFang ChenShaolin Zhu
Andraž PeliconRavi ShekharBlaž ŠkrljMatthew PurverSenja Pollak
Sahinur Rahman LaskarAbdullah Faiz Ur Rahman KhiljiPartha PakraySivaji Bandyopadhyay
Marzieh MozafariKhouloud MnassriReza FarahbakhshNoël Crespi
Varsha NaikK. RajeswariKshitij JadhavAniket Rahalkar