JOURNAL ARTICLE

Speech Cloning: Text-To-Speech Using VITS

Padmanaban R

Year: 2024 Journal:   Engineering and Technology Journal Vol: 09 (05)

Abstract

Voice is one of the most common and natural communication methods for humans. Voice is becoming the primary interface for AI voice assistants like Amazon Alexa, as well as in autos and smart home devices. Homes and so on. As human-machine communication becomes more common, researchers are exploring technology that mimics genuine speech. Speech cloning is the practice of copying or mimicking another person's speech, usually utilizing modern technology and artificial intelligence (AI). This entails producing a synthetic or cloned version of someone's voice that sounds very similar to the actual speaker. The objective is to produce speech that is indistinguishable from the genuine person, both in tone and intonation. Instant Voice Cloning (IVC) in text-to-speech (TTS) synthesis refers to the TTS model's capacity to copy the voice of any reference speaker based on a short audio sample, without requiring extra speaker-specific training. This method is usually referred to as zero-shot TTS. IVC provides users with the flexibility to tailor the generated voice, offering significant value across diverse real-world applications. Examples include media content creation, personalized chatbots, and multi-modal interactions between humans and computers or extensive language models.

Keywords:
Speech recognition Cloning (programming) Computer science Biology

Metrics

1
Cited By
0.64
FWCI (Field Weighted Citation Impact)
11
Refs
0.65
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and dialogue systems
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.