Internet and social media usage has skyrocketed over the past two decades, changing how people communicate with one another on a basic level. Numerous favourable results have resulted from this. The risks and harms that come with it are also there. It is impossible for humans to control the amount of damaging content, such as hate speech, that is available online. Researching automated methods for hate speech identification has drawn more attention from academics. Through the creation of a single homogeneous dataset, we investigate various publicly accessible datasets in this work. We establish a baseline model and enhance model performance scores using various optimisation strategies after classifying them into two categories: hate or non-hate. After achieving a competitive performance score, we develop a tool that, using the same feedback, quickly locates and evaluates a page with an effective measure. This tool then retrains our model using the new data. In three languages: English, German, and Spanish. We demonstrate the superior performance of our multilingual approach. In comparison to most monolingual models, this results in performance that is equal to or better.
Mithun DasSomnath BanerjeePunyajoy SahaAnimesh Mukherjee
Λαυρεντιάδου, Βασιλική Γεωργίου
Hammad RizwanMuhammad Haroon ShakeelAsim Karim