Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network

George Trigeorgis; Fabien Ringeval; Raymond Brueckner; Erik Marchi; Mihalis A. Nicolaou; Björn W. Schuller; Stefanos Zafeiriou

doi:10.1109/icassp.2016.7472669

JOURNAL ARTICLE

Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network

George Trigeorgis Fabien Ringeval Raymond Brueckner Erik Marchi Mihalis A. Nicolaou Björn W. Schuller Stefanos Zafeiriou

Year: 2016 Pages: 5200-5204

DOI: 10.1109/icassp.2016.7472669

Get Full-Text PDF Get Analytical Report

Abstract

The automatic recognition of spontaneous emotions from speech is a challenging task. On the one hand, acoustic features need to be robust enough to capture the emotional content for various styles of speaking, and while on the other, machine learning algorithms need to be insensitive to outliers while being able to model the context. Whereas the latter has been tackled by the use of Long Short-Term Memory (LSTM) networks, the former is still under very active investigations, even though more than a decade of research has provided a large set of acoustic descriptors. In this paper, we propose a solution to the problem of `context-aware' emotional relevant feature extraction, by combining Convolutional Neural Networks (CNNs) with LSTM networks, in order to automatically learn the best representation of the speech signal directly from the raw time representation. In this novel work on the so-called end-to-end speech emotion recognition, we show that the use of the proposed topology significantly outperforms the traditional approaches based on signal processing techniques for the prediction of spontaneous and natural emotions on the RECOLA database.

Keywords:

End-to-end principle Computer science Deep learning Speech recognition Artificial intelligence

Metrics

838

Cited By

96.99

FWCI (Field Weighted Citation Impact)

Refs

1.00

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Emotion and Mood Recognition

Social Sciences → Psychology → Experimental and Cognitive Psychology

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network

Abstract

Metrics

Citation History

Topics

Related Documents

Adieu Features? End-to-end Speech Emotion Recognition using a Deep Convolutional Recurrent Network

Adieu recurrence? End-to-end speech emotion recognition using a context stacking dilated convolutional network

Squeeze-and-excitation 3D convolutional attention recurrent network for end-to-end speech emotion recognition

End-to-End Speech Emotion Recognition Using Deep Neural Networks

End-to-End Speech Recognition Using Recurrent Neural Network (RNN)