Enhancing End-to-End Speech Synthesis by Modeling Interrogative Sentences with Speaker Adaptation

Ali Raheem Mandeel; Mohammed Salah Al-Radhi; Tamás Gábor Csapó

doi:10.1109/sped59241.2023.10314910

ScienceGate Book Chapters

JOURNAL ARTICLE

Enhancing End-to-End Speech Synthesis by Modeling Interrogative Sentences with Speaker Adaptation

Ali Raheem Mandeel Mohammed Salah Al-Radhi Tamás Gábor Csapó

Year: 2023 Pages: 158-163

DOI: 10.1109/sped59241.2023.10314910

Get Full-Text PDF Get Analytical Report

Abstract

Despite end-to-end text-to-speech (TTS) synthesizers producing human-like speech, they might still need more intuitive user control over prosody. Modeling interrogative sentence prosody is challenging due to the significant variation in question types. Synthesized intonation frequently requires more accuracy, richness, and detail when only a small amount of adaptation data from particular sentence types are available. This paper uses speaker adaptation to enhance the modeling of interrogative sentence prosody in speech synthesis, tested on an English dataset. The adaptation data were selected based on the occurrence of interrogative sentences. The first dataset consisted of sentences with frequent interrogative sentences, whereas the second dataset contained declarative sentences. Two target speakers (male and female) were adapted. Objective and subjective evaluations show that the proposed model achieves remarkable performance in intonation. The MUSHRA subjective listening test has shown better intonation patterns using the interrogative dataset than the declarative one. The potential application of this model is for the vision impaired and chatbots/voice bots.

Keywords:

Interrogative Intonation (linguistics) Prosody Computer science Speech recognition Sentence Natural language processing Adaptation (eye) Active listening Interrogative word Artificial intelligence Speech synthesis Linguistics Psychology Communication

Metrics

Cited By

0.51

FWCI (Field Weighted Citation Impact)

Refs

0.67

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and dialogue systems

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Enhancing End-to-End Speech Synthesis by Modeling Interrogative Sentences with Speaker Adaptation

Abstract

Metrics

Citation History

Topics

Related Documents

Modeling Irregular Voice in End-to-End Speech Synthesis via Speaker Adaptation

Semi-Supervised Speaker Adaptation for End-to-End Speech Synthesis with Pretrained Models

Speaker Adaptation for Multichannel End-to-End Speech Recognition

End-to-End Multi-speaker Speech Synthesis with Controllable Stress

Dynamic Speaker Representations Adjustment and Decoder Factorization for Speaker Adaptation in End-to-End Speech Synthesis