Experimenting Language Identification for Sentiment Analysis of English Punjabi Code Mixed Social Media Text

Neetika Bansal; Vishal Goyal; Simpel Rani

doi:10.4018/978-1-6684-6303-1.ch076

ScienceGate Book Chapters

BOOK-CHAPTER

Experimenting Language Identification for Sentiment Analysis of English Punjabi Code Mixed Social Media Text

Neetika Bansal Vishal Goyal Simpel Rani

Year: 2022 IGI Global eBooks Pages: 1470-1479 Publisher: IGI Global

DOI: 10.4018/978-1-6684-6303-1.ch076

Get Full-Text PDF Get Analytical Report

Abstract

People do not always use Unicode, rather, they mix multiple languages. The processing of codemixed data becomes challenging due to the linguistic complexities. The noisy text increases the complexities of language identification. The dataset used in this article contains Facebook and Twitter messages collected through Facebook graph API and twitter API. The annotated English Punjabi code mixed dataset has been trained using a pipeline Dictionary Vectorizer, N-gram approach with some features. Furthermore, classifiers used are Logistic Regression, Decision Tree Classifier and Gaussian Naïve Bayes are used to perform language identification at word level. The results show that Logistic Regression performs best with an accuracy of 86.63 with an F-1 measure of 0.88. The success of machine learning approaches depends on the quality of labeled corpora.

Keywords:

Computer science Natural language processing Artificial intelligence Unicode Social media n-gram Language identification Sentiment analysis Classifier (UML) Identification (biology) Naive Bayes classifier Logistic regression Decision tree Language model Machine learning World Wide Web Support vector machine Natural language

Metrics

Cited By

0.36

FWCI (Field Weighted Citation Impact)

Refs

0.57

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Authorship Attribution and Profiling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Hate Speech and Cyberbullying Detection

Physical Sciences → Computer Science → Artificial Intelligence

Experimenting Language Identification for Sentiment Analysis of English Punjabi Code Mixed Social Media Text

Abstract

Metrics

Citation History

Topics

Related Documents

Experimenting Language Identification for Sentiment Analysis of English Punjabi Code Mixed Social Media Text

Bilingual Sentiment Analysis for a Code-mixed Punjabi English Social Media Text

Enhancing Language Identification For English-Punjabi (Romanized) Code-Mixed Social Media Text Using Transformers

Sentiment Analysis on Hindi–English Code-Mixed Social Media Text

Sentiment Analysis of English-Punjabi Code Mixed Social Media Content for Agriculture Domain