Recently cyber-bullying and online harassment have become two of the most serious issues in many public online communities. In this paper, we use data from Wikipedia talk page edits to train multi-label classifier that detects different types of toxicity in online user-generated content. We present different data augmentation techniques to overcome the data imbalance problem in the Wikipedia dataset. The proposed solution is an ensemble of three models: convolutional neural network (CNN), bidirectional long short-term memory (LSTM) and bidirectional gated recurrent units (GRU). We divide the classification problem into two steps, first we determine whether or not the input is toxic then we find the types of toxicity present in the toxic content. The evaluation results show that the proposed ensemble approach provides the highest accuracy among all considered algorithms. It achieves 0.828 F1-score for toxic/non-toxic classification and 0.872 for toxicity types prediction.
Nihaya S. SalihDindar M. Ahmed
Yilin YanMin ChenMei‐Ling ShyuShu‐Ching Chen