JOURNAL ARTICLE

Domain Generalization via Aggregation and Separation for Audio Deepfake Detection

Yuankun XieHaonan ChengYutian WangLong Ye

Year: 2023 Journal:   IEEE Transactions on Information Forensics and Security Vol: 19 Pages: 344-358   Publisher: Institute of Electrical and Electronics Engineers

Abstract

In this paper, we propose an Aggregation and Separation Domain Generalization (ASDG) method for Audio DeepFake Detection (ADD). Fake speech generated from different methods exhibits varied amplitude and frequency distributions rather than genuine speech. In addition, the spoofing attacks in training sets may not keep pace with the evolving diversity of real-world deepfake distributions. In light of this, we attempt to learn an ideal feature space that can aggregate real speech and separate fake speech to achieve better generalizability in the detection of unseen target domains. Specifically, we first propose a feature generator based on Lightweight Convolutional Neural Networks (LCNN), which is employed for generating a feature space and categorizing the feature into real and fake. Meanwhile, single-side domain adversarial learning is leveraged to make only the real speech from different domains indistinguishable, which enables the distribution of real speech to be aggregated in the feature space. Furthermore, a triplet loss is adopted to separate the distribution of fake speech while aggregating the distribution of real speech. Finally, in order to test the generalizability of the model, we train it with three different English datasets and evaluate in harsh conditions: cross-language and noisy datasets. The extensive experiments show that ASDG outperforms the baseline models in cross-domain tasks and decreases Equal Error Rate (EER) by up to 39.24% when compared to that of RawNet2. It is proved that the proposed Aggregation and Separation Domain Generalization method can be an effective strategy to improve the model generalizability.

Keywords:
Computer science Generalizability theory Overfitting Feature (linguistics) Generalization Feature vector Artificial intelligence Domain (mathematical analysis) Extrapolation Convolutional neural network Pattern recognition (psychology) Speech recognition Aggregate (composite) Frequency domain Machine learning Artificial neural network Mathematics

Metrics

34
Cited By
9.13
FWCI (Field Weighted Citation Impact)
114
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Digital Media Forensic Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.