Improving Speech Emotion Recognition With Adversarial Data Augmentation Network

Yi Lu; Man‐Wai Mak

doi:10.1109/tnnls.2020.3027600

ScienceGate Book Chapters

JOURNAL ARTICLE

Improving Speech Emotion Recognition With Adversarial Data Augmentation Network

Yi Lu Man‐Wai Mak

Year: 2020 Journal: IEEE Transactions on Neural Networks and Learning Systems Vol: 33 (1)Pages: 172-184 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/tnnls.2020.3027600

Get Full-Text PDF Get Analytical Report

Abstract

When training data are scarce, it is challenging to train a deep neural network without causing the overfitting problem. For overcoming this challenge, this article proposes a new data augmentation network-namely adversarial data augmentation network (ADAN)- based on generative adversarial networks (GANs). The ADAN consists of a GAN, an autoencoder, and an auxiliary classifier. These networks are trained adversarially to synthesize class-dependent feature vectors in both the latent space and the original feature space, which can be augmented to the real training data for training classifiers. Instead of using the conventional cross-entropy loss for adversarial training, the Wasserstein divergence is used in an attempt to produce high-quality synthetic samples. The proposed networks were applied to speech emotion recognition using EmoDB and IEMOCAP as the evaluation data sets. It was found that by forcing the synthetic latent vectors and the real latent vectors to share a common representation, the gradient vanishing problem can be largely alleviated. Also, results show that the augmented data generated by the proposed networks are rich in emotion information. Thus, the resulting emotion classifiers are competitive with state-of-the-art speech emotion recognition systems.

Keywords:

Autoencoder Computer science Overfitting Artificial intelligence Adversarial system Feature vector Feature learning Machine learning Classifier (UML) Artificial neural network Pattern recognition (psychology)

Metrics

108

Cited By

5.56

FWCI (Field Weighted Citation Impact)

Refs

0.97

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Generative Adversarial Networks and Image Synthesis

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Improving Speech Emotion Recognition With Adversarial Data Augmentation Network

Abstract

Metrics

Citation History

Topics

Related Documents

Adversarial Data Augmentation Network for Speech Emotion Recognition

Facial Emotion Recognition Data Augmentation using Generative Adversarial Network

Improving Speech Emotion Recognition Using Data Augmentation and Balancing Techniques

Towards Improving Speech Emotion Recognition Using Synthetic Data Augmentation from Emotion Conversion

Speech emotion recognition using data augmentation