Cross-lingual Sentiment Analysis of Code-Mixed Corpus based on Cross-lingual Word Embedding

Reyjohn R. Frias; Ruji P. Medina; Ariel M. Sison

doi:10.1109/icicyta60173.2023.10428917

ScienceGate Book Chapters

JOURNAL ARTICLE

Cross-lingual Sentiment Analysis of Code-Mixed Corpus based on Cross-lingual Word Embedding

Reyjohn R. Frias Ruji P. Medina Ariel M. Sison

Year: 2023 Vol: 1 Pages: 101-106

DOI: 10.1109/icicyta60173.2023.10428917

Get Full-Text PDF Get Analytical Report

Abstract

Bilingualism is a common linguistic phenomenon that causes a challenge in opinion mining. The early methods in Cross-lingual Sentiment Analysis (CLSA), based on machine translation, parallel corpus, and bilingual sentiment lexicon, face issues in terms of translation error, vocabulary coverage, and dependence on extensive parallel data. Hence, this study examined the effectiveness of Cross-lingual Word Embedding (CLWE) for the sentiment analysis of code-mixed Filipino-English corpus. A large-scale manually annotated code-mixed dataset containing stakeholders' feedback on the Higher Education Institutions' services and infrastructure was developed to address resource scarcity. Several pre-trained transformer-based CLWE methods, such as mBERT, XLM-R, and XLM-T, were employed to represent the words from the two languages in the same vector space and obtain the cross-lingual embeddings. An Attention-based BiLSTM-CNN neural architecture, the baseline model from the previous work, was fine-tuned on these cross-lingual embeddings to perform the sentiment analysis of code-mixed Filipino-English corpus. The experimental results demonstrate that XLM-T has achieved the highest performance rate, with 91.30% accuracy, 90.36% precision, 90.92% recall, and 90.61% F1-score. Thus, employing cross-lingual word embedding was proven effective as it significantly increases the accuracy by up to 10.02% compared to the baseline model, which only uses word embedding having no cross-lingual alignment.

Keywords:

Computer science Natural language processing Word embedding Artificial intelligence Sentiment analysis Word (group theory) Code (set theory) Embedding Linguistics

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.21

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Sentiment Analysis and Opinion Mining

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Text Analysis Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Cross-lingual Sentiment Analysis of Code-Mixed Corpus based on Cross-lingual Word Embedding

Abstract

Metrics

Topics

Related Documents

Cross-Lingual Sentiment Relation Capturing for Cross-Lingual Sentiment Analysis

Enhancing Bangla-English Code-Mixed Sentiment Analysis with Cross-Lingual Word Replacement and Data Augmentation

Cross-Lingual Word Embedding Models: Typology

Word Embedding for Cross-lingual Natural Language Analysis

ArbEngVec : Arabic-English Cross-Lingual Word Embedding Model