JOURNAL ARTICLE

Deep Representation Learning for Speech Emotion Recognition

Latif, Siddique

Year: 2022 Journal:   University of Southern Queensland research data collection

Abstract

The success of machine learning (ML) algorithms generally depends on the quality of data representation or features. Good representations of the data make it easier to develop machine learning predictors or even deep learning (DL) classifiers. In speech emotion recognition (SER) research, the emotion classifiers heavily depend on hand-engineered acoustic features, which are typically crafted with human domain knowledge. Automatic emotional representation learning from the speech is a challenging task because speech contains different attributes of the speaker (i.e., gender, age, emotion, etc.) along with the linguistic message. Recent advancements in DL have fuelled the area of deep representation learning from speech. The prime goal of deep representation learning is to learn the complex relationships from input data, usually through the nonlinear transformations. Research on deep representation learning has significantly evolved, however, very few studies have investigated emotional representation learning from speech using advanced DL techniques. In this thesis, I explore different deep representation learning techniques for SER to improve the performance and generalisation of the systems. I broadly solve two major problems: (1) how deep representation learning can be utilised to improve the performance of SER by utilising the unlabelled, synthetic, and augmented data; (2) how deep representation learning can be applied to design generalised and robust SER systems. To address these problems, I propose different deep representation learning techniques to learn from unlabelled, synthetic, and augmented data to improve the performance and generalisation of SER systems. I found that injecting the additional unlabelled, augmented, and synthetic data in SER systems help improve the performance of SER systems. I also show that adversarial self-supervised learning can improve cross-language SER and deeper architectures learn robust generalised representation for SER in noisy conditions.

Keywords:
Deep learning Feature learning Representation (politics) Multi-task learning External Data Representation Task (project management) Domain (mathematical analysis)

Metrics

1
Cited By
0.38
FWCI (Field Weighted Citation Impact)
0
Refs
0.70
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Research Data Management Practices
Physical Sciences →  Computer Science →  Information Systems
Species Distribution and Climate Change
Physical Sciences →  Environmental Science →  Ecological Modeling
Data Quality and Management
Social Sciences →  Decision Sciences →  Management Science and Operations Research

Related Documents

JOURNAL ARTICLE

Survey of Deep Representation Learning for Speech Emotion Recognition

Siddique LatifRajib RanaSara KhalifaRaja JurdakJunaid QadirBjörn W. Schuller

Journal:   IEEE Transactions on Affective Computing Year: 2021 Vol: 14 (2)Pages: 1634-1654
JOURNAL ARTICLE

Speech Emotion Recognition Using Deep Learning

Dr.G. PrathibhaYelle KavyaPierre JacobL Poojita

Journal:   INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT Year: 2024 Vol: 08 (07)Pages: 1-13
© 2026 ScienceGate Book Chapters — All rights reserved.