Multi-Modal Attention for Speech Emotion Recognition

Zexu Pan; Zhaojie Luo; Jichen Yang; Haizhou Li

doi:10.21437/interspeech.2020-1653

ScienceGate Book Chapters

JOURNAL ARTICLE

Multi-Modal Attention for Speech Emotion Recognition

Zexu Pan Zhaojie Luo Jichen Yang Haizhou Li

Year: 2020

DOI: 10.21437/interspeech.2020-1653

Get Full-Text PDF Get Analytical Report

Abstract

Emotion represents an essential aspect of human speech that is manifested in speech prosody.Speech, visual, and textual cues are complementary in human communication.In this paper, we study a hybrid fusion method, referred to as multi-modal attention network (MMAN) to make use of visual and textual cues in speech emotion recognition.We propose a novel multimodal attention mechanism, cLSTM-MMA, which facilitates the attention across three modalities and selectively fuse the information.cLSTM-MMA is fused with other uni-modal subnetworks in the late fusion.The experiments show that speech emotion recognition benefits significantly from visual and textual cues, and the proposed cLSTM-MMA alone is as competitive as other fusion methods in terms of accuracy, but with a much more compact network structure.The proposed hybrid network MMAN achieves state-of-the-art performance on IEMOCAP database for emotion recognition.

Keywords:

Speech recognition Computer science Emotion recognition Modal Natural language processing

Metrics

Cited By

7.39

FWCI (Field Weighted Citation Impact)

Refs

0.98

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Emotion and Mood Recognition

Social Sciences → Psychology → Experimental and Cognitive Psychology

Hand Gesture Recognition Systems

Physical Sciences → Computer Science → Human-Computer Interaction

Multi-Modal Attention for Speech Emotion Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Multi-head attention fusion networks for multi-modal speech emotion recognition

Multi-modal Speech Emotion Recognition Based on TCN and Attention

Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework

Attention Driven Fusion for Multi-Modal Emotion Recognition

Attention-Based Multi-modal Emotion Recognition from Art