JOURNAL ARTICLE

Voice Deepfake Detection Using the Self-Supervised Pre-Training Model HuBERT

Lanting LiTianliang LuXingbang MaMengjiao YuanDa Wan

Year: 2023 Journal:   Applied Sciences Vol: 13 (14)Pages: 8488-8488   Publisher: Multidisciplinary Digital Publishing Institute

Abstract

In recent years, voice deepfake technology has developed rapidly, but current detection methods have the problems of insufficient detection generalization and insufficient feature extraction for unknown attacks. This paper presents a forged speech detection method (HuRawNet2_modified) based on a self-supervised pre-trained model (HuBERT) to improve detection (and address the above problems). A combination of impulsive signal-dependent additive noise and additive white Gaussian noise was adopted for data boosting and augmentation, and the HuBERT model was fine-tuned on different language databases. On this basis, the size of the extracted feature maps was modified independently by the α-feature map scaling (α-FMS) method, with a modified end-to-end method using the RawNet2 model as the backbone structure. The results showed that the HuBERT model could extract features more comprehensively and accurately. The best evaluation indicators were an equal error rate (EER) of 2.89% and a minimum tandem detection cost function (min t-DCF) of 0.2182 on the database of the ASVspoof2021 LA challenge, which verified the effectiveness of the detection method proposed in this paper. Compared with the baseline systems in databases of the ASVspoof 2021 LA challenge and the FMFCC-A, the values of EER and min t-DCF decreased. The results also showed that the self-supervised pre-trained model with fine-tuning can extract acoustic features across languages. And the detection can be slightly improved when the languages of the pre-trained database, and the fine-tuned and tested database are the same.

Keywords:
Computer science Generalization Pattern recognition (psychology) Artificial intelligence Mixture model Feature (linguistics) Boosting (machine learning) Speech recognition Machine learning Mathematics

Metrics

22
Cited By
5.62
FWCI (Field Weighted Citation Impact)
29
Refs
0.95
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing

Related Documents

JOURNAL ARTICLE

HuBERT Ensemble Models for Singing Voice Deepfake Detection

Levine, GabrielThurlow, DrewLevitan, Sarah ItaArfa, Jon

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2025
JOURNAL ARTICLE

HuBERT Ensemble Models for Singing Voice Deepfake Detection

Levine, GabrielThurlow, DrewLevitan, Sarah ItaArfa, Jon

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2025
BOOK-CHAPTER

Audio Deepfake Detection via Dual Branch Classifier with Self-Supervised Pre-Trained Model

Yuxuan CaoLijian GaoQirong Mao

Communications in computer and information science Year: 2026 Pages: 313-327
© 2026 ScienceGate Book Chapters — All rights reserved.