JOURNAL ARTICLE

Topic Modeling based Text Classification Regarding Islamophobia using Word Embedding and Transformers Techniques

Ammar SaeedHikmat Ullah KhanAchyut ShankarTalha ImranDanish KhanM. KamranMuhammad Attique Khan

Year: 2023 Journal:   ACM Transactions on Asian and Low-Resource Language Information Processing   Publisher: Association for Computing Machinery

Abstract

Islamophobia is a rising area of concern in the current era where Muslims face discrimination and receive negative perspectives towards their religion, Islam. Islamophobia is a type of racism that is being practiced by individuals, groups, and organizations worldwide. Moreover, the ease of access to social media platforms and their augmented usage has also contributed to spreading hate speech, false information, and negative opinions about Islam. In this research study, we focused to detect Islamophobic textual content shared on various social media platforms. We explored the state-of-the-art techniques being followed in text data mining and Natural Language Processing (NLP). Topic modelling algorithm Latent Dirichlet Allocation is used to find top topics. Then, word embedding approaches such as Word2Vec and Global Vectors for word representation (GloVe) are used as feature extraction techniques. For text classification, we utilized modern text analysis techniques of transformers-based Deep Learning algorithms named Bidirectional Encoders Representation from Transformers (BERT) and Generative Pre-Trained Transformer (GPT). For results comparison, we conducted an extensive empirical analysis of Machine Learning algorithms and Deep Learning using conventional textual features such as the Term Frequency-Inverse Document Frequency, N-gram, and Bag of words (BoW). The empirical based results evaluated using standard performance evaluation measures show that the proposed approach effectively detects the textual content related to Islamophobia. In the corpus of the study under Machine Learning models Support Vector Machine (SVM) performed best with an F1 score of 91%. The Transformer based core NLP models and the Deep Learning model Convolutional Neural Network (CNN) when combined with GloVe performed best among all the techniques except SVM with BoW. GPT, SVM when combined with BoW and BERT yielded the best F1 score of 92%, 92% and 91.9% respectively, while CNN performed slightly poor with an F1 score of 91%.

Keywords:
Computer science Word embedding Artificial intelligence Islamophobia Natural language processing Latent Dirichlet allocation Word2vec Sentiment analysis Transformer Feature learning Social media Support vector machine Autoencoder Deep learning Machine learning Topic model Embedding Engineering World Wide Web Islam

Metrics

12
Cited By
3.07
FWCI (Field Weighted Citation Impact)
46
Refs
0.90
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Hate Speech and Cyberbullying Detection
Physical Sciences →  Computer Science →  Artificial Intelligence
Terrorism, Counterterrorism, and Political Violence
Social Sciences →  Social Sciences →  Sociology and Political Science
Sentiment Analysis and Opinion Mining
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Short Text Classification Based on Latent Topic Modeling and Word Embedding

Peng LiJunqing HeChenglong Ma

Journal:   DEStech Transactions on Computer Science and Engineering Year: 2017
BOOK-CHAPTER

Word Embedding-based Topic Modeling

Slimane BellaouarAhmed ItbireneBrahim Chihani

Advances in intelligent systems research/Advances in Intelligent Systems Research Year: 2024 Pages: 89-102
JOURNAL ARTICLE

Probabilistic topic modeling for short text based on word embedding networks

Marcelo PitaMatheus NunesGisele L. Pappa

Journal:   Applied Intelligence Year: 2022 Vol: 52 (15)Pages: 17829-17844
© 2026 ScienceGate Book Chapters — All rights reserved.