Modeling Irregular Voice in End-to-End Speech Synthesis via Speaker Adaptation

Ali Raheem Mandeel; Mohammed Salah Al-Radhi; Tamás Gábor Csapó

doi:10.1109/sped59241.2023.10314920

ScienceGate Book Chapters

JOURNAL ARTICLE

Modeling Irregular Voice in End-to-End Speech Synthesis via Speaker Adaptation

Ali Raheem Mandeel Mohammed Salah Al-Radhi Tamás Gábor Csapó

Year: 2023 Pages: 170-175

DOI: 10.1109/sped59241.2023.10314920

Get Full-Text PDF Get Analytical Report

Abstract

End-to-end text-to-speech (TTS) synthesizers may not create a speech similar to the target speaker when the adaptation data is limited or/and chosen randomly. Creaky voice might occur frequently, depending on the speaker and the context. This paper uses speaker adaptation to model creaky voice in speech synthesis. We adapted FastSpeech 2 with four target speakers by selecting the adaptation data based on the occurrence of creaky phonation: 1) sentences with frequent creaky voice, 2) randomly chosen sentences, and 3) sentences with few creaky voice. In an objective evaluation, the proposed model successfully modeled creaky voice using data selection (1), producing speech with more creakiness than the other data selections. A subjective test revealed that these frequent creaky voice synthesized samples (for the average of four speakers) obtained slightly less preference than the synthesized speech from a few creaky voice adaptation sentences. Irregular voice models might contribute to building emotional and personalized speech synthesis.

Keywords:

Computer science Speech recognition Adaptation (eye) Phonation Context (archaeology) Speech synthesis Voice analysis Natural language processing Psychology Linguistics

Metrics

Cited By

1.02

FWCI (Field Weighted Citation Impact)

Refs

0.77

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech and dialogue systems

Physical Sciences → Computer Science → Artificial Intelligence

Modeling Irregular Voice in End-to-End Speech Synthesis via Speaker Adaptation

Abstract

Metrics

Citation History

Topics

Related Documents

Enhancing End-to-End Speech Synthesis by Modeling Interrogative Sentences with Speaker Adaptation

Speaker voice normalization for end-to-end speech translation

Speaker Adaptation for Multichannel End-to-End Speech Recognition

Semi-Supervised Speaker Adaptation for End-to-End Speech Synthesis with Pretrained Models

Dynamic Speaker Representations Adjustment and Decoder Factorization for Speaker Adaptation in End-to-End Speech Synthesis