Imbalanced Toxic Comments Classification Using Data Augmentation and Deep Learning

Mai Ibrahim; Marwan Torki; Nagwa El-Makky

doi:10.1109/icmla.2018.00141

ScienceGate Book Chapters

JOURNAL ARTICLE

Imbalanced Toxic Comments Classification Using Data Augmentation and Deep Learning

Mai Ibrahim Marwan Torki Nagwa El-Makky

Year: 2018 Pages: 875-878

DOI: 10.1109/icmla.2018.00141

Get Full-Text PDF Get Analytical Report

Abstract

Recently cyber-bullying and online harassment have become two of the most serious issues in many public online communities. In this paper, we use data from Wikipedia talk page edits to train multi-label classifier that detects different types of toxicity in online user-generated content. We present different data augmentation techniques to overcome the data imbalance problem in the Wikipedia dataset. The proposed solution is an ensemble of three models: convolutional neural network (CNN), bidirectional long short-term memory (LSTM) and bidirectional gated recurrent units (GRU). We divide the classification problem into two steps, first we determine whether or not the input is toxic then we find the types of toxicity present in the toxic content. The evaluation results show that the proposed ensemble approach provides the highest accuracy among all considered algorithms. It achieves 0.828 F1-score for toxic/non-toxic classification and 0.872 for toxicity types prediction.

Keywords:

Computer science Classifier (UML) Convolutional neural network Artificial intelligence Machine learning Deep learning Data mining

Metrics

110

Cited By

7.35

FWCI (Field Weighted Citation Impact)

Refs

0.97

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Hate Speech and Cyberbullying Detection

Physical Sciences → Computer Science → Artificial Intelligence

Software Engineering Research

Physical Sciences → Computer Science → Information Systems

Spam and Phishing Detection

Physical Sciences → Computer Science → Information Systems

Imbalanced Toxic Comments Classification Using Data Augmentation and Deep Learning

Abstract

Metrics

Citation History

Topics

Related Documents

Improving Imbalanced Data Classification Using Deep Learning

Ensemble Classification Method for Imbalanced Data Using Deep Learning

Deep Learning for Imbalanced Multimedia Data Classification

Classification of Imbalanced Data Using Deep Learning with Adding Noise

Deep-learning-based malware classification algorithm using data augmentation