DST: Deformable Speech Transformer for Emotion Recognition

Weidong Chen; Xiaofen Xing; Xiangmin Xu; Jianxin Pang; Lan Du

doi:10.1109/icassp49357.2023.10096966

ScienceGate Book Chapters

JOURNAL ARTICLE

DST: Deformable Speech Transformer for Emotion Recognition

Weidong Chen Xiaofen Xing Xiangmin Xu Jianxin Pang Lan Du

Year: 2023 Pages: 1-5

DOI: 10.1109/icassp49357.2023.10096966

Get Full-Text PDF Get Analytical Report

Abstract

Enabled by multi-head self-attention, Transformer has exhibited remarkable results in speech emotion recognition (SER). Compared to the original full attention mechanism, window-based attention is more effective in learning fine-grained features while greatly reducing model redundancy. However, emotional cues are present in a multi-granularity manner such that the pre-defined fixed window can severely degrade the model flexibility. In addition, it is difficult to obtain the optimal window settings manually. In this paper, we propose a Deformable Speech Transformer, named DST, for SER task. DST determines the usage of window sizes conditioned on in-put speech via a light-weight decision network. Meanwhile, data-dependent offsets derived from acoustic features are utilized to adjust the positions of the attention windows, allowing DST to adaptively discover and attend to the valuable in-formation embedded in the speech. Extensive experiments on IEMOCAP and MELD demonstrate the superiority of DST.

Keywords:

Computer science Transformer Speech recognition Redundancy (engineering) Granularity Window (computing) Artificial intelligence Engineering

Metrics

Cited By

12.50

FWCI (Field Weighted Citation Impact)

Refs

0.98

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Emotion and Mood Recognition

Social Sciences → Psychology → Experimental and Cognitive Psychology

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

DST: Deformable Speech Transformer for Emotion Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

TASER-Net: Transformer Based Speech Emotion Recognition

Speech Emotion Recognition Based on Swin-Transformer

Speech Emotion Recognition using CNN-TRANSFORMER Architecture

Vocal Sentiments: Transformer Based Speech Emotion Recognition

An Ensemble Transformer Model For Speech Emotion Recognition