End-to-End Binaural Speech Synthesis

Wen‐Chin Huang; Dejan Marković; Alexander Richard; Israel D. Gebru; Anjali Kondur Menon

doi:10.21437/interspeech.2022-10603

ScienceGate Book Chapters

JOURNAL ARTICLE

End-to-End Binaural Speech Synthesis

Wen‐Chin Huang Dejan Marković Alexander Richard Israel D. Gebru Anjali Kondur Menon

Year: 2022 Journal: Interspeech 2022 Pages: 1218-1222

DOI: 10.21437/interspeech.2022-10603

Get Full-Text PDF Get Analytical Report

Abstract

In this work, we present an end-to-end binaural speech synthesis system that combines a low-bitrate audio codec with a powerful binaural decoder that is capable of accurate speech binauralization while faithfully reconstructing environmental factors like ambient noise or reverb.The network is a modified vectorquantized variational autoencoder, trained with several carefully designed objectives, including an adversarial loss.We evaluate the proposed system on an internal binaural dataset with objective metrics and a perceptual study.Results show that the proposed approach matches the ground truth data more closely than previous methods.In particular, we demonstrate the capability of the adversarial loss in capturing environment effects needed to create an authentic auditory scene.

Keywords:

Computer science Binaural recording Speech synthesis Speech recognition End-to-end principle Artificial intelligence

Metrics

Cited By

1.18

FWCI (Field Weighted Citation Impact)

Refs

0.78

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and dialogue systems

Physical Sciences → Computer Science → Artificial Intelligence

Phonetics and Phonology Research

Social Sciences → Psychology → Experimental and Cognitive Psychology

End-to-End Binaural Speech Synthesis

Abstract

Metrics

Citation History

Topics

Related Documents

End-To-End Multi-Channel Speaker Extraction and Binaural Speech Synthesis

Improving Unsupervised Style Transfer in end-to-end Speech Synthesis with end-to-end Speech Recognition

End‐to‐End Speech Synthesis for Tibetan Multidialect

End-to-End Speech Synthesis Based on BERT

End-to-End Paired Ambisonic-Binaural Audio Rendering