JOURNAL ARTICLE

Hindi-English Code Mixed Hate Speech Detection using Character Level Embeddings

Abstract

Hinglish is a portmanteau word for 'Hindi' and 'English', and refers to the informal "language" predominantly used in the South-Asian (Indian) Sub-Continent, a blend of the two languages it derives its name from. It considerably differs from the English language in grammar, syntax, punctuations, phonetics and accent, as well as in sentiments.As it is more convenient to use English for certain technical words, sports events, scientific phenomena, and other things, mixed usage of English and regional languages has gained considerable prominence in day-to-day conversations and Social Media. This research aims to create an independent and self-sufficing model that classifies Hingish texts as Hate Speech, Abusive or Non-Offensive.The prevalent use of code-mixed language in the subcontinent, the sensitive nature of hate speeches, and the need of a self-sufficient model for Hinglish, together serve as the motivation for this research.We have used character level embeddings for Hinglish Language which has the potential to most efficiently extract the context from Hinglish sentences given the level of variation in syntax and semantics of the code-mixed (a language that is a combination of two or more languages) language. Later we trained various deep learning classifier models. Hybridisation of GRU with Attention Model performed best among more than 12 models experimented with. The use of Character Level Embeddings, GRU, and attention layer are novel to Hate Speech Detection in Hinglish Code-Mixed Language.

Keywords:
Computer science Natural language processing Linguistics Artificial intelligence Language identification Character (mathematics) Code-switching Hindi Language model Syntax Cache language model Natural language Universal Networking Language Comprehension approach

Metrics

14
Cited By
1.55
FWCI (Field Weighted Citation Impact)
33
Refs
0.85
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Hate Speech and Cyberbullying Detection
Physical Sciences →  Computer Science →  Artificial Intelligence
Swearing, Euphemism, Multilingualism
Social Sciences →  Social Sciences →  Communication
© 2026 ScienceGate Book Chapters — All rights reserved.