Abstract

Sentiment Analysis a subset of Affective Computing is often categorized as a Natural Language Processing task and is restricted to the textual modality. Since the world around us is multimodal, i.e., we see things, listen to sounds, and feel the various textures of objects, sentiment analysis must be applied to the different modalities present in our daily lives. In this paper, we have implemented sentiment analysis on the following two modalities - text and image. The study compares the performance of individual single-modal models to the performance of a multimodal model for the task of sentiment analysis. This study employs the use of a functional RNN model for textual sentiment analysis and a functional CNN model for visual sentiment analysis. Multimodality is achieved by performing fusion. Additionally, a comparison of two types of fusion is explored, namely Intermediate fusion and Late fusion. There is an improvement from previous studies that is evident from the experimental results where our fusion model gives an accuracy of 79.63%. The promising results from the study will prove to be helpful for budding researchers in exploring prospects in the field of multimodality and affective domain.

Keywords:
Multimodality Modalities Computer science Sentiment analysis Modality (human–computer interaction) Task (project management) Artificial intelligence Natural language processing Modal Fusion Domain (mathematical analysis) Machine learning Linguistics Engineering

Metrics

1
Cited By
0.25
FWCI (Field Weighted Citation Impact)
32
Refs
0.60
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Sentiment Analysis and Opinion Mining
Physical Sciences →  Computer Science →  Artificial Intelligence
Color perception and design
Social Sciences →  Psychology →  Social Psychology
© 2026 ScienceGate Book Chapters — All rights reserved.