Deep Representation Learning for Speech Emotion Recognition

Latif, Siddique

doi:10.26192/w8w00

ScienceGate Book Chapters

JOURNAL ARTICLE

Deep Representation Learning for Speech Emotion Recognition

Latif, Siddique

Year: 2022 Journal: University of Southern Queensland research data collection

DOI: 10.26192/w8w00

Get Full-Text PDF Get Analytical Report

Abstract

The success of machine learning (ML) algorithms generally depends on the quality of data representation or features. Good representations of the data make it easier to develop machine learning predictors or even deep learning (DL) classifiers. In speech emotion recognition (SER) research, the emotion classifiers heavily depend on hand-engineered acoustic features, which are typically crafted with human domain knowledge. Automatic emotional representation learning from the speech is a challenging task because speech contains different attributes of the speaker (i.e., gender, age, emotion, etc.) along with the linguistic message. Recent advancements in DL have fuelled the area of deep representation learning from speech. The prime goal of deep representation learning is to learn the complex relationships from input data, usually through the nonlinear transformations. Research on deep representation learning has significantly evolved, however, very few studies have investigated emotional representation learning from speech using advanced DL techniques. In this thesis, I explore different deep representation learning techniques for SER to improve the performance and generalisation of the systems. I broadly solve two major problems: (1) how deep representation learning can be utilised to improve the performance of SER by utilising the unlabelled, synthetic, and augmented data; (2) how deep representation learning can be applied to design generalised and robust SER systems. To address these problems, I propose different deep representation learning techniques to learn from unlabelled, synthetic, and augmented data to improve the performance and generalisation of SER systems. I found that injecting the additional unlabelled, augmented, and synthetic data in SER systems help improve the performance of SER systems. I also show that adversarial self-supervised learning can improve cross-language SER and deeper architectures learn robust generalised representation for SER in noisy conditions.

Keywords:

Deep learning Feature learning Representation (politics) Multi-task learning External Data Representation Task (project management) Domain (mathematical analysis)

Metrics

Cited By

0.38

FWCI (Field Weighted Citation Impact)

Refs

0.70

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Research Data Management Practices

Physical Sciences → Computer Science → Information Systems

Species Distribution and Climate Change

Physical Sciences → Environmental Science → Ecological Modeling

Data Quality and Management

Social Sciences → Decision Sciences → Management Science and Operations Research

Deep Representation Learning for Speech Emotion Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Survey of Deep Representation Learning for Speech Emotion Recognition

Speech Emotion Recognition with Local-Global Aware Deep Representation Learning

A deep interpretable representation learning method for speech emotion recognition

Representation Learning for Speech Emotion Recognition

Speech Emotion Recognition Using Deep Learning