Speaker Anonymization Using Neural Audio Codec Language Models

Michele Panariello; Francesco Nespoli; Massimiliano Todisco; Nicholas Evans

doi:10.1109/icassp48485.2024.10447871

ScienceGate Book Chapters

JOURNAL ARTICLE

Speaker Anonymization Using Neural Audio Codec Language Models

Michele Panariello Francesco Nespoli Massimiliano Todisco Nicholas Evans

Year: 2024 Pages: 4725-4729

DOI: 10.1109/icassp48485.2024.10447871

Get Full-Text PDF Get Analytical Report

Abstract

The vast majority of approaches to speaker anonymization involve the extraction of fundamental frequency estimates, linguistic features and a speaker embedding which is perturbed to obfuscate the speaker identity before an anonymized speech waveform is resynthesized using a vocoder. Recent work has shown that x-vector transformations are difficult to control consistently: other sources of speaker information contained within fundamental frequency and linguistic features are re-entangled upon vocoding, meaning that anonymized speech signals still contain speaker information. We propose an approach based upon neural audio codecs (NACs), which are known to generate high-quality synthetic speech when combined with language models. NACs use quantized codes, which are known to effectively bottleneck speaker-related information: we demonstrate the potential of speaker anonymization systems based on NAC language modeling by applying the evaluation framework of the Voice Privacy Challenge 2022.

Keywords:

Computer science Speech recognition Speaker diarisation Codec Identity (music) Speaker recognition Embedding Concatenation (mathematics) Natural language processing Language model Artificial intelligence

Metrics

Cited By

11.50

FWCI (Field Weighted Citation Impact)

Refs

0.98

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speaker Anonymization Using Neural Audio Codec Language Models

Abstract

Metrics

Citation History

Topics

Related Documents

Generating Sample-Based Musical Instruments Using Neural Audio Codec Language Models

Generating Sample-Based Musical Instruments Using Neural Audio Codec Language Models

Generating Sample-Based Musical Instruments Using Neural Audio Codec Language Models

Generating Sample-Based Musical Instruments Using Neural Audio Codec Language Models

Codec-ASV: Exploring Neural Audio Codec For Speaker Representation Learning