Hinglish is a portmanteau word for 'Hindi' and 'English', and refers to the informal "language" predominantly used in the South-Asian (Indian) Sub-Continent, a blend of the two languages it derives its name from. It considerably differs from the English language in grammar, syntax, punctuations, phonetics and accent, as well as in sentiments.As it is more convenient to use English for certain technical words, sports events, scientific phenomena, and other things, mixed usage of English and regional languages has gained considerable prominence in day-to-day conversations and Social Media. This research aims to create an independent and self-sufficing model that classifies Hingish texts as Hate Speech, Abusive or Non-Offensive.The prevalent use of code-mixed language in the subcontinent, the sensitive nature of hate speeches, and the need of a self-sufficient model for Hinglish, together serve as the motivation for this research.We have used character level embeddings for Hinglish Language which has the potential to most efficiently extract the context from Hinglish sentences given the level of variation in syntax and semantics of the code-mixed (a language that is a combination of two or more languages) language. Later we trained various deep learning classifier models. Hybridisation of GRU with Attention Model performed best among more than 12 models experimented with. The use of Character Level Embeddings, GRU, and attention layer are novel to Hate Speech Detection in Hinglish Code-Mixed Language.
T. Y. S. S SantoshK. V.S. Aravind
K SreelakshmiB. PremjithK. P. Soman
Shubham ShuklaSushama NagpalSangeeta Sabharwal
Abhinav JainSanjay Kumar JhaBasant AgarwalMatej KlemenMarko Robnik‐Šikonja