JOURNAL ARTICLE

Speech Emotion Recognition with Local-Global Aware Deep Representation Learning

Abstract

Convolutional neural network (CNN) based deep representation learning methods for speech emotion recognition (SER) have demonstrated great success. The basic design of CNN restricts the ability to model only local information well. Capsule network (CapsNet) can overcome the shortages of CNNs to capture the shallow global features from the spectrogram, although CapsNet cannot learn the local and deep global information. In this paper, we propose a local-global aware deep representation learning system that mainly includes two modules. One module contains a multi-scale CNN, time- frequency CNN (TFCNN) to learn the local representation. In the other module, we introduce a structure with dense connections of multiple blocks to learn shallow and deep global information. Every block in this structure is a complete CapsNet improved by a new routing algorithm. The local and global representations are fed to the classifier and achieve an absolute increase of at least 4.25% than benchmarks on IEMOCAP.

Keywords:
Computer science Convolutional neural network Deep learning Artificial intelligence Spectrogram Representation (politics) Feature learning Block (permutation group theory) Classifier (UML) Pattern recognition (psychology) Speech recognition

Metrics

55
Cited By
7.21
FWCI (Field Weighted Citation Impact)
31
Refs
0.97
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing

Related Documents

JOURNAL ARTICLE

Deep Representation Learning for Speech Emotion Recognition

Latif, Siddique

Journal:   University of Southern Queensland research data collection Year: 2022
JOURNAL ARTICLE

Speech Emotion Recognition with Global-Aware Fusion on Multi-Scale Feature Representation

Wenjing ZhuXiang Li

Journal:   ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Year: 2022 Pages: 6437-6441
JOURNAL ARTICLE

Speech Emotion Recognition with deep learning

Hadhami AouaniYassine Ben Ayed

Journal:   Procedia Computer Science Year: 2020 Vol: 176 Pages: 251-260
© 2026 ScienceGate Book Chapters — All rights reserved.