JOURNAL ARTICLE

Fine-Grained Emotional Control of Text-to-Speech: Learning to Rank Inter- and Intra-Class Emotion Intensities

Abstract

State-of-the-art Text-To-Speech (TTS) models are capable of producing high-quality speech. The generated speech, however, is usually neutral in emotional expression, whereas very often one would want fine-grained emotional control of words or phonemes. Although still challenging, the first TTS models have been recently proposed that are able to control voice by manually assigning emotion intensity. Unfortunately, due to the neglect of intra-class distance, the intensity differences are often unrecognizable. In this paper, we propose a fine-grained controllable emotional TTS, that considers both inter- and intra-class distances and be able to synthesize speech with recognizable intensity difference. Our subjective and objective experiments demonstrate that our model exceeds two state-of-the-art controllable TTS models for controllability, emotion expressiveness and naturalness.

Keywords:
Naturalness Computer science Controllability Speech recognition Speech synthesis Class (philosophy) Control (management) Quality (philosophy) Rank (graph theory) Artificial intelligence Natural language processing Mathematics

Metrics

5
Cited By
1.28
FWCI (Field Weighted Citation Impact)
26
Refs
0.79
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech and dialogue systems
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

EMOQ-TTS: Emotion Intensity Quantization for Fine-Grained Controllable Emotional Text-to-Speech

Chae-Bin ImSang-Hoon LeeSeung-Bin KimSeong‐Whan Lee

Journal:   ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Year: 2022 Pages: 6317-6321
JOURNAL ARTICLE

Text-Based Fine-Grained Emotion Prediction

Gargi SinghDhanajit BrahmaPiyush RaiAshutosh Modi

Journal:   IEEE Transactions on Affective Computing Year: 2023 Vol: 15 (2)Pages: 405-416
BOOK-CHAPTER

Fine-Grained Style Control in VITS-Based Text-to-Speech Synthesis

Zhong HuihangDengfeng KeYa LiWenhan YaoWenqian Bao

Communications in computer and information science Year: 2023 Pages: 139-147
© 2026 ScienceGate Book Chapters — All rights reserved.