Speech Cloning: Text-To-Speech Using VITS

Padmanaban R

doi:10.5281/zenodo.11158985

ScienceGate Book Chapters

JOURNAL ARTICLE

Speech Cloning: Text-To-Speech Using VITS

Padmanaban R

Year: 2024 Journal: Engineering and Technology Journal Vol: 09 (05)

DOI: 10.5281/zenodo.11158985

Get Full-Text PDF Get Analytical Report

Abstract

Voice is one of the most common and natural communication methods for humans. Voice is becoming the primary interface for AI voice assistants like Amazon Alexa, as well as in autos and smart home devices. Homes and so on. As human-machine communication becomes more common, researchers are exploring technology that mimics genuine speech. Speech cloning is the practice of copying or mimicking another person's speech, usually utilizing modern technology and artificial intelligence (AI). This entails producing a synthetic or cloned version of someone's voice that sounds very similar to the actual speaker. The objective is to produce speech that is indistinguishable from the genuine person, both in tone and intonation. Instant Voice Cloning (IVC) in text-to-speech (TTS) synthesis refers to the TTS model's capacity to copy the voice of any reference speaker based on a short audio sample, without requiring extra speaker-specific training. This method is usually referred to as zero-shot TTS. IVC provides users with the flexibility to tailor the generated voice, offering significant value across diverse real-world applications. Examples include media content creation, personalized chatbots, and multi-modal interactions between humans and computers or extensive language models.

Keywords:

Speech recognition Cloning (programming) Computer science Biology

Metrics

Cited By

0.64

FWCI (Field Weighted Citation Impact)

Refs

0.65

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and dialogue systems

Physical Sciences → Computer Science → Artificial Intelligence

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech Cloning: Text-To-Speech Using VITS

Abstract

Metrics

Citation History

Topics

Related Documents

Speech Cloning: Text-To-Speech Using VITS

Enhancing Sinhala Text-to-Speech with End-to-End VITS Architecture

Text to Speech Bahasa Jawa dialek Solo-Jogja dengan Metode VITS

Fine-Grained Style Control in VITS-Based Text-to-Speech Synthesis

Efficient English Text-to-Speech Voice Cloning Using Limited Speaker Data