Topic Modeling based Text Classification Regarding Islamophobia using Word Embedding and Transformers Techniques

Ammar Saeed; Hikmat Ullah Khan; Achyut Shankar; Talha Imran; Danish Khan; M. Kamran; Muhammad Attique Khan

doi:10.1145/3626318

ScienceGate Book Chapters

JOURNAL ARTICLE

Topic Modeling based Text Classification Regarding Islamophobia using Word Embedding and Transformers Techniques

Ammar Saeed Hikmat Ullah Khan Achyut Shankar Talha Imran Danish Khan M. Kamran Muhammad Attique Khan

Year: 2023 Journal: ACM Transactions on Asian and Low-Resource Language Information Processing Publisher: Association for Computing Machinery

DOI: 10.1145/3626318

Get Full-Text PDF Get Analytical Report

Abstract

Islamophobia is a rising area of concern in the current era where Muslims face discrimination and receive negative perspectives towards their religion, Islam. Islamophobia is a type of racism that is being practiced by individuals, groups, and organizations worldwide. Moreover, the ease of access to social media platforms and their augmented usage has also contributed to spreading hate speech, false information, and negative opinions about Islam. In this research study, we focused to detect Islamophobic textual content shared on various social media platforms. We explored the state-of-the-art techniques being followed in text data mining and Natural Language Processing (NLP). Topic modelling algorithm Latent Dirichlet Allocation is used to find top topics. Then, word embedding approaches such as Word2Vec and Global Vectors for word representation (GloVe) are used as feature extraction techniques. For text classification, we utilized modern text analysis techniques of transformers-based Deep Learning algorithms named Bidirectional Encoders Representation from Transformers (BERT) and Generative Pre-Trained Transformer (GPT). For results comparison, we conducted an extensive empirical analysis of Machine Learning algorithms and Deep Learning using conventional textual features such as the Term Frequency-Inverse Document Frequency, N-gram, and Bag of words (BoW). The empirical based results evaluated using standard performance evaluation measures show that the proposed approach effectively detects the textual content related to Islamophobia. In the corpus of the study under Machine Learning models Support Vector Machine (SVM) performed best with an F1 score of 91%. The Transformer based core NLP models and the Deep Learning model Convolutional Neural Network (CNN) when combined with GloVe performed best among all the techniques except SVM with BoW. GPT, SVM when combined with BoW and BERT yielded the best F1 score of 92%, 92% and 91.9% respectively, while CNN performed slightly poor with an F1 score of 91%.

Keywords:

Computer science Word embedding Artificial intelligence Islamophobia Natural language processing Latent Dirichlet allocation Word2vec Sentiment analysis Transformer Feature learning Social media Support vector machine Autoencoder Deep learning Machine learning Topic model Embedding Engineering World Wide Web Islam

Metrics

Cited By

3.07

FWCI (Field Weighted Citation Impact)

Refs

0.90

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Hate Speech and Cyberbullying Detection

Physical Sciences → Computer Science → Artificial Intelligence

Terrorism, Counterterrorism, and Political Violence

Social Sciences → Social Sciences → Sociology and Political Science

Sentiment Analysis and Opinion Mining

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling based Text Classification Regarding Islamophobia using Word Embedding and Transformers Techniques

Abstract

Metrics

Citation History

Topics

Related Documents

Short Text Classification Based on Latent Topic Modeling and Word Embedding

Word Embedding-based Topic Modeling

Supervised Topic Modeling Using Word Embedding with Machine Learning Techniques

Text Classification with Topic-based Word Embedding and Convolutional Neural Networks

Probabilistic topic modeling for short text based on word embedding networks