JOURNAL ARTICLE

Imbalanced Toxic Comments Classification Using Data Augmentation and Deep Learning

Abstract

Recently cyber-bullying and online harassment have become two of the most serious issues in many public online communities. In this paper, we use data from Wikipedia talk page edits to train multi-label classifier that detects different types of toxicity in online user-generated content. We present different data augmentation techniques to overcome the data imbalance problem in the Wikipedia dataset. The proposed solution is an ensemble of three models: convolutional neural network (CNN), bidirectional long short-term memory (LSTM) and bidirectional gated recurrent units (GRU). We divide the classification problem into two steps, first we determine whether or not the input is toxic then we find the types of toxicity present in the toxic content. The evaluation results show that the proposed ensemble approach provides the highest accuracy among all considered algorithms. It achieves 0.828 F1-score for toxic/non-toxic classification and 0.872 for toxicity types prediction.

Keywords:
Computer science Classifier (UML) Convolutional neural network Artificial intelligence Machine learning Deep learning Data mining

Metrics

110
Cited By
7.35
FWCI (Field Weighted Citation Impact)
9
Refs
0.97
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Hate Speech and Cyberbullying Detection
Physical Sciences →  Computer Science →  Artificial Intelligence
Software Engineering Research
Physical Sciences →  Computer Science →  Information Systems
Spam and Phishing Detection
Physical Sciences →  Computer Science →  Information Systems

Related Documents

JOURNAL ARTICLE

Improving Imbalanced Data Classification Using Deep Learning

Nihaya S. SalihDindar M. Ahmed

Journal:   International Journal of Computational and Experimental Science and Engineering Year: 2025 Vol: 11 (3)
BOOK-CHAPTER

Ensemble Classification Method for Imbalanced Data Using Deep Learning

Yoon Sang Lee

Lecture notes in business information processing Year: 2019 Pages: 162-170
JOURNAL ARTICLE

Classification of Imbalanced Data Using Deep Learning with Adding Noise

Wan-Wei FanChing‐Hung Lee

Journal:   Journal of Sensors Year: 2021 Vol: 2021 (1)
© 2026 ScienceGate Book Chapters — All rights reserved.