JOURNAL ARTICLE

HuBERT Ensemble Models for Singing Voice Deepfake Detection

Levine, GabrielThurlow, DrewLevitan, Sarah ItaArfa, Jon

Year: 2025 Journal:   Zenodo (CERN European Organization for Nuclear Research)   Publisher: European Organization for Nuclear Research

Abstract

Tools for singing voice synthesis and vocal conversion focused on music have greatly improved in both quality and ease of use, leading to an explosion of music with synthetic vocals. This proliferation has made it difficult for listeners to discern human from deepfake and synthetic vocals. While there are robust approaches for detecting synthetic speech and vocal spoofing, identifying synthetic singing voices presents a unique set of challenges. In this paper, we present a new, publicly available dataset of labeled music tracks containing human and synthetic vocals. We evaluate existing synthetic speech detection models using this new dataset. We also introduce a novel ensemble approach that combines high-level speech representations from HuBERT embeddings with a CNN classifier using traditional low-level audio features. Our evaluation confirms this to be an effective approach. We share our results, trained models, and our labeled dataset to encourage future research.

Keywords:
Singing Classifier (UML) Set (abstract data type) Training set Ensemble learning Speech synthesis Synthetic data

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.36
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Geochemistry and Geologic Mapping
Physical Sciences →  Computer Science →  Artificial Intelligence
Geological and Geophysical Studies
Physical Sciences →  Earth and Planetary Sciences →  Geology
Geological Modeling and Analysis
Physical Sciences →  Earth and Planetary Sciences →  Geochemistry and Petrology
© 2026 ScienceGate Book Chapters — All rights reserved.