DWFormer: Dynamic Window Transformer for Speech Emotion Recognition

Shuaiqi Chen; Xiaofen Xing; Wei-Bin Zhang; Weidong Chen; Xiangmin Xu

doi:10.1109/icassp49357.2023.10094651

ScienceGate Book Chapters

JOURNAL ARTICLE

DWFormer: Dynamic Window Transformer for Speech Emotion Recognition

Shuaiqi Chen Xiaofen Xing Wei-Bin Zhang Weidong Chen Xiangmin Xu

Year: 2023 Pages: 1-5

DOI: 10.1109/icassp49357.2023.10094651

Get Full-Text PDF Get Analytical Report

Abstract

Speech emotion recognition is crucial to human-computer interaction. The temporal regions that represent different emotions scatter in different parts of the speech locally. Moreover, the temporal scales of important information may vary over a large range within and across speech segments. Although transformer-based models have made progress in this field, the existing models could not precisely locate important regions at different temporal scales. To address the issue, we propose Dynamic Window transFormer (DWFormer), a new architecture that leverages temporal importance by dynamically splitting samples into windows. Self-attention mechanism is applied within windows for capturing temporal important information locally in a fine-grained way. Cross-window information interaction is also taken into account for global communication. DWFormer is evaluated on both the IEMO-CAP and the MELD datasets. Experimental results show that the proposed model achieves better performance than the previous state-of-the-art methods.

Keywords:

Computer science Transformer Window (computing) Speech recognition Architecture Artificial intelligence Engineering

Metrics

Cited By

10.00

FWCI (Field Weighted Citation Impact)

Refs

0.97

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Emotion and Mood Recognition

Social Sciences → Psychology → Experimental and Cognitive Psychology

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

DWFormer: Dynamic Window Transformer for Speech Emotion Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

LGFormer: A Local-Global Dynamic Attention Window Transformer for Speech Emotion Recognition

Speech emotion recognition using the novel SwinEmoNet (Shifted Window Transformer Emotion Network)

SwinTSER: An Improved Bilingual Speech Emotion Recognition Using Shift Window Transformer

DropFormer: A Dynamic Noise-Dropping Transformer for Speech Emotion Recognition

DST: Deformable Speech Transformer for Emotion Recognition