HuBERT Ensemble Models for Singing Voice Deepfake Detection

Levine, Gabriel; Thurlow, Drew; Levitan, Sarah Ita; Arfa, Jon

doi:10.5281/zenodo.16955638

ScienceGate Book Chapters

JOURNAL ARTICLE

HuBERT Ensemble Models for Singing Voice Deepfake Detection

Levine, Gabriel Thurlow, Drew Levitan, Sarah Ita Arfa, Jon

Year: 2025 Journal: Zenodo (CERN European Organization for Nuclear Research) Publisher: European Organization for Nuclear Research

DOI: 10.5281/zenodo.16955638

Get Full-Text PDF Get Analytical Report

Abstract

Tools for singing voice synthesis and vocal conversion focused on music have greatly improved in both quality and ease of use, leading to an explosion of music with synthetic vocals. This proliferation has made it difficult for listeners to discern human from deepfake and synthetic vocals. While there are robust approaches for detecting synthetic speech and vocal spoofing, identifying synthetic singing voices presents a unique set of challenges. In this paper, we present a new, publicly available dataset of labeled music tracks containing human and synthetic vocals. We evaluate existing synthetic speech detection models using this new dataset. We also introduce a novel ensemble approach that combines high-level speech representations from HuBERT embeddings with a CNN classifier using traditional low-level audio features. Our evaluation confirms this to be an effective approach. We share our results, trained models, and our labeled dataset to encourage future research.

Keywords:

Singing Classifier (UML) Set (abstract data type) Training set Ensemble learning Speech synthesis Synthetic data

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.36

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Geochemistry and Geologic Mapping

Physical Sciences → Computer Science → Artificial Intelligence

Geological and Geophysical Studies

Physical Sciences → Earth and Planetary Sciences → Geology

Geological Modeling and Analysis

Physical Sciences → Earth and Planetary Sciences → Geochemistry and Petrology

HuBERT Ensemble Models for Singing Voice Deepfake Detection

Abstract

Metrics

Topics

Related Documents

HuBERT Ensemble Models for Singing Voice Deepfake Detection

SingFake: Singing Voice Deepfake Detection

Audio Features Investigation for Singing Voice Deepfake Detection

SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge

Voice Deepfake Detection Using the Self-Supervised Pre-Training Model HuBERT