Hindi-English Code Mixed Hate Speech Detection using Character Level Embeddings

Rahul Kumar; Vasu Gupta; Vibhu Sehra; Yashaswi Raj Vardhan

doi:10.1109/iccmc51019.2021.9418261

ScienceGate Book Chapters

JOURNAL ARTICLE

Hindi-English Code Mixed Hate Speech Detection using Character Level Embeddings

Rahul Kumar Vasu Gupta Vibhu Sehra Yashaswi Raj Vardhan

Year: 2021 Pages: 1112-1118

DOI: 10.1109/iccmc51019.2021.9418261

Get Full-Text PDF Get Analytical Report

Abstract

Hinglish is a portmanteau word for 'Hindi' and 'English', and refers to the informal "language" predominantly used in the South-Asian (Indian) Sub-Continent, a blend of the two languages it derives its name from. It considerably differs from the English language in grammar, syntax, punctuations, phonetics and accent, as well as in sentiments.As it is more convenient to use English for certain technical words, sports events, scientific phenomena, and other things, mixed usage of English and regional languages has gained considerable prominence in day-to-day conversations and Social Media. This research aims to create an independent and self-sufficing model that classifies Hingish texts as Hate Speech, Abusive or Non-Offensive.The prevalent use of code-mixed language in the subcontinent, the sensitive nature of hate speeches, and the need of a self-sufficient model for Hinglish, together serve as the motivation for this research.We have used character level embeddings for Hinglish Language which has the potential to most efficiently extract the context from Hinglish sentences given the level of variation in syntax and semantics of the code-mixed (a language that is a combination of two or more languages) language. Later we trained various deep learning classifier models. Hybridisation of GRU with Attention Model performed best among more than 12 models experimented with. The use of Character Level Embeddings, GRU, and attention layer are novel to Hate Speech Detection in Hinglish Code-Mixed Language.

Keywords:

Computer science Natural language processing Linguistics Artificial intelligence Language identification Character (mathematics) Code-switching Hindi Language model Syntax Cache language model Natural language Universal Networking Language Comprehension approach

Metrics

Cited By

1.55

FWCI (Field Weighted Citation Impact)

Refs

0.85

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Hate Speech and Cyberbullying Detection

Physical Sciences → Computer Science → Artificial Intelligence

Swearing, Euphemism, Multilingualism

Social Sciences → Social Sciences → Communication

Hindi-English Code Mixed Hate Speech Detection using Character Level Embeddings

Abstract

Metrics

Citation History

Topics

Related Documents

Hate Speech Detection in Hindi-English Code-Mixed Social Media Text

Detection of Hate Speech Text in Hindi-English Code-mixed Data

Hate Speech Detection in Code-Mixed Datasets Using Pretrained Embeddings and Transformers

Code-Mixed Romanized Hindi Hate Speech Identification: Leveraging BERT Embeddings and Particle Swarm Optimization

Hate Speech Detection in Code-Mixed English-Hindi with Bilingual Large Language Models