Speech emotion recognition method based on time-aware bidirectional multi-scale network

Liyan Zhang; Jiaxin Du; Jiayan Li; Xinyu Wang

doi:10.1088/1742-6596/2816/1/012102

ScienceGate Book Chapters

JOURNAL ARTICLE

Speech emotion recognition method based on time-aware bidirectional multi-scale network

Liyan Zhang Jiaxin Du Jiayan Li Xinyu Wang

Year: 2024 Journal: Journal of Physics Conference Series Vol: 2816 (1)Pages: 012102-012102 Publisher: IOP Publishing

DOI: 10.1088/1742-6596/2816/1/012102

Get Full-Text PDF Get Analytical Report

Abstract

Abstract In response to the difficulty of traditional speech emotion recognition models in capturing long-distance dependencies in speech signals and the impact of changes in speaker pronunciation speed and pause time, this paper proposes a new time emotion modeling method called Time Perceived Bidirectional Multi-scale Network (TIM-Net), which is used to learn Multi-scale contextual emotion expression in different time scales. TIM-Net starts by acquiring temporal emotional representations using time-aware blocks. Subsequently, information from different time points is combined to enhance contextual understanding of emotional expression. Finally, it consolidates various Timescale features to better accommodate emotional fluctuations. The experiment shows that the network can focus useful information on features, and the WAR and UAR of TIM-Net are significantly better than other models on RAVDESS, EMO-DB, and EMOVO datasets.

Keywords:

Computer science Speech recognition Focus (optics) Scale (ratio) Pronunciation Emotional expression Expression (computer science) Net (polyhedron) Emotion recognition Artificial intelligence Psychology Cognitive psychology Linguistics

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.17

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Emotion and Mood Recognition

Social Sciences → Psychology → Experimental and Cognitive Psychology

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech emotion recognition method based on time-aware bidirectional multi-scale network

Abstract

Metrics

Topics

Related Documents

Multi-scale Aggregation Network for Speech Emotion Recognition

[Research on bimodal emotion recognition algorithm based on multi-branch bidirectional multi-scale time perception].

MFAN: Multi-Scale Feature Attention Network for Speech Emotion Recognition

Speech Emotion Recognition with Global-Aware Fusion on Multi-Scale Feature Representation

MBDA: A Multi-scale Bidirectional Perception Approach for Cross-Corpus Speech Emotion Recognition