Lightweight Speech Emotion Recognition Model Based on Multi-Task Learning

SONG Yukai, XIE Jiang

ScienceGate Book Chapters

JOURNAL ARTICLE

Lightweight Speech Emotion Recognition Model Based on Multi-Task Learning

SONG Yukai, XIE Jiang

Year: 2023 Journal: DOAJ (DOAJ: Directory of Open Access Journals)

Get Full-Text PDF Get Analytical Report

Abstract

Current Speech Emotion Recognition（SER） models have shortcomings such as large numbers of training parameters，poor model generalization，and low emotion recognition accuracy. Therefore，under the condition of limited sample data，it is particularly important to build a lightweight model to improve model recognition efficiency and accuracy.To this end，this paper proposes a lightweight end-to-end multi-task deep learning model named P-CNN+Gender，which is composed of three parts：a speech feature combination network，body convolutional network responsible for emotion and gender feature extraction，and emotion and gender classifier.The model uses the Mel-Frequency Cepstral Coefficients（MFCC） features of speech as input，and the feature combination network uses convolutional kernels of different sizes to extract MFCC features in parallel and combine them for the subsequent body convolutional network to extract emotion and gender features.Finally，considering the correlation between emotional expression and gender，gender classification is integrated into emotion classification as an auxiliary task to improve the model's emotion classification performance.The model is tested on the IEMOCAP，Emo-DB，and CASIA speech emotion datasets and achieved Unweighted Accuracy（UA） results of 73.3%，96.4% and 93.9%，which are 3.0，5.8，and 6.5 percentage points higher than the P-CNN model，respectively.The training parameter quantity of this model is only 1/10-1/2 that of other models，such as 3D-ACRNN，CNNBiRNN，etc.，and the model achieves faster processing and higher accuracy.

Keywords:

Mel-frequency cepstrum Emotion recognition Feature (linguistics) Task (project management) Emotion classification Feature extraction Convolutional neural network Pattern recognition (psychology)

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.53

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Emotion and Mood Recognition

Social Sciences → Psychology → Experimental and Cognitive Psychology

Sentiment Analysis and Opinion Mining

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Computing and Algorithms

Social Sciences → Social Sciences → Urban Studies

Lightweight Speech Emotion Recognition Model Based on Multi-Task Learning

Abstract

Metrics

Topics

Related Documents

Speech Emotion Recognition Based on Multi-Task Learning

Speech Emotion Recognition with Multi-Task Learning

A Lightweight Multi-modal Emotion Recognition Network Based on Multi-task Learning

Meta Multi-Task Learning for Speech Emotion Recognition

Multi-task Learning for Speech Emotion and Emotion Intensity Recognition