Domain Generalization via Aggregation and Separation for Audio Deepfake Detection

Yuankun Xie; Haonan Cheng; Yutian Wang; Long Ye

doi:10.1109/tifs.2023.3324724

ScienceGate Book Chapters

JOURNAL ARTICLE

Domain Generalization via Aggregation and Separation for Audio Deepfake Detection

Yuankun Xie Haonan Cheng Yutian Wang Long Ye

Year: 2023 Journal: IEEE Transactions on Information Forensics and Security Vol: 19 Pages: 344-358 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/tifs.2023.3324724

Get Full-Text PDF Get Analytical Report

Abstract

In this paper, we propose an Aggregation and Separation Domain Generalization (ASDG) method for Audio DeepFake Detection (ADD). Fake speech generated from different methods exhibits varied amplitude and frequency distributions rather than genuine speech. In addition, the spoofing attacks in training sets may not keep pace with the evolving diversity of real-world deepfake distributions. In light of this, we attempt to learn an ideal feature space that can aggregate real speech and separate fake speech to achieve better generalizability in the detection of unseen target domains. Specifically, we first propose a feature generator based on Lightweight Convolutional Neural Networks (LCNN), which is employed for generating a feature space and categorizing the feature into real and fake. Meanwhile, single-side domain adversarial learning is leveraged to make only the real speech from different domains indistinguishable, which enables the distribution of real speech to be aggregated in the feature space. Furthermore, a triplet loss is adopted to separate the distribution of fake speech while aggregating the distribution of real speech. Finally, in order to test the generalizability of the model, we train it with three different English datasets and evaluate in harsh conditions: cross-language and noisy datasets. The extensive experiments show that ASDG outperforms the baseline models in cross-domain tasks and decreases Equal Error Rate (EER) by up to 39.24% when compared to that of RawNet2. It is proved that the proposed Aggregation and Separation Domain Generalization method can be an effective strategy to improve the model generalizability.

Keywords:

Computer science Generalizability theory Overfitting Feature (linguistics) Generalization Feature vector Artificial intelligence Domain (mathematical analysis) Extrapolation Convolutional neural network Pattern recognition (psychology) Speech recognition Aggregate (composite) Frequency domain Machine learning Artificial neural network Mathematics

Metrics

Cited By

9.13

FWCI (Field Weighted Citation Impact)

114

Refs

0.98

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Digital Media Forensic Detection

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Generalization via Aggregation and Separation for Audio Deepfake Detection

Abstract

Metrics

Citation History

Topics

Related Documents

Generalization of Audio Deepfake Detection

Harder or Different? Understanding Generalization of Audio Deepfake Detection

Multi-domain Multi-scale DeepFake Detection for Generalization

Cross-Domain Audio Deepfake Detection: Dataset and Analysis

Continual Unsupervised Domain Adaptation for Audio Deepfake Detection