DISSERTATION

Transcription-Guided and Self-Supervised Speech Representations for Singing Voice Conversion

Betty Cortiñas Lorenzo

Year: 2022 University:   Zenodo (CERN European Organization for Nuclear Research)   Publisher: European Organization for Nuclear Research

Abstract

Singing Voice Conversion is the task of converting the timbre of a source singer to another one without modifying content or intonation. One of the main concerns when building a Singing Voice Conversion system is the type of content representations used for ensuring high intelligibility in converted voices. One approach is to use Cotatron encoder, however it has a major drawback since it requires lyrics transcriptions as input. In order not to be dependent on those transcriptions, a new area in Automatic Speech Recognition known as Self-Supervised Speech Representations seeks to extract robust latent representations from large-scale unlabeled speech corpus. A recent and popular family of such algorithms is VQ-Wav2Vec, that has been already applied to Speech Voice Conversion, however its use for Singing Voice Conversion has not been explored yet. In this master thesis, we implement a new Singing Voice Conversion using VQ-Wav2Vec features and perform a performance comparison with respect to Cotatron. We found through subjective listening tests and Word Error Rate calculation that self-supervised speech representations with VQ-Wav2Vec content features provide higher intelligibility when compared with transcription-guided content features extracted with Cotatron. In addition, singer voice similarity is slightly improved when using VQ-Wav2Vec features.

Keywords:
Singing Speech recognition Transcription (linguistics) Computer science Linguistics Acoustics

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.10
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.