JOURNAL ARTICLE

Multimodal Approach: Emotion Recognition from Audio and Video Modality

Abstract

Emotion recognition is a domain of artificial intelligence that recognizes human emotions similar to the cognitive capabilities of Affective computing. It is an interdisciplinary field of computer science that is emerging as a new area for research. Recognizing emotions from different input sources using computational intelligence is an advanced application of artificial intelligence and has benefitted numerous applications in human-computer interaction, healthcare, and psychology. The proposed work extensively studies a multimodal approach to emotion recognition that takes input from audio and video modalities. The study critically analyses the performance and improvement of emotion recognition deep learning models and fusion techniques of audio-video input modalities. The proposed approach demonstrates the effectiveness of the multimodal approach and the challenges of multimodal fusion while building a model on RAVDESS, and IEMOCAP dataset. In essence, the research strives to contribute to the growing body of knowledge in the field of audio-based emotion recognition by providing a comprehensive analysis of feature extraction techniques and deep learning models. By understanding the strengths and limitations of each approach, researchers and practitioners can make informed decisions when developing systems that accurately capture and interpret emotions from audio and video modalities. Conclusively, the advancement has the potential to enhance human-computer interaction, improve mental health assessment, and enrich various other domains where emotion plays a pivotal role.

Keywords:
Modalities Computer science Affective computing Field (mathematics) Modality (human–computer interaction) Emotion recognition Multimodal learning Artificial intelligence Deep learning Multimodality Human–computer interaction

Metrics

2
Cited By
0.83
FWCI (Field Weighted Citation Impact)
24
Refs
0.71
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
EEG and Brain-Computer Interfaces
Life Sciences →  Neuroscience →  Cognitive Neuroscience
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing

Related Documents

JOURNAL ARTICLE

Multimodal emotion recognition from audio and video

S NithyasriB HemavarthiniBharathi N. Gopalsamy

Journal:   International Journal of Science and Research Archive Year: 2024 Vol: 12 (1)Pages: 142-149
JOURNAL ARTICLE

Audio and Video-based Emotion Recognition using Multimodal Transformers

Vijay JohnYasutomo Kawanishi

Journal:   2022 26th International Conference on Pattern Recognition (ICPR) Year: 2022 Pages: 2582-2588
© 2026 ScienceGate Book Chapters — All rights reserved.