Van-Nhan TranHoanh-Su LePiljoo ChoiSuk‐Hwan LeeKi‐Ryong Kwon
Deepfakes are digitally manipulated videos that appear realistic but are actually fake. With the rapid advances in deep generative models, the accessibility and sophistication of such manipulation technologies are increasing, making it more challenging to detect fake content. Different facial forgery techniques result in complex data distributions, and most existing deepfake detection approaches rely on convolutional neural networks (CNNs) that treat the task as a binary classification problem. While these methods achieve high accuracy on specific datasets, their generalization performance across datasets is often poor due to overfitting to manipulation techniques seen during training. In this study, we propose a model called MEViT, which integrates the EfficientNet Vision Transformer with a meta-learning framework to enhance generalization in deepfake detection. Furthermore, we introduce a pair-discrimination loss to push the feature representations of fake samples away from those of real samples, and a domain adjustment loss to reduce domain shifts across different manipulation methods. The MEViT model is trained on a specific manipulation method in the FaceForensics++ dataset and evaluated on other unseen methods from the same dataset. Additionally, we conduct extensive experiments on multiple deepfake benchmarks, including FaceForensics++ and CelebDF-v2, and compare our method with various state-of-the-art approaches to demonstrate its effectiveness.
Yeong-Rak ChoiVan-Nhan TranSeon-Ja LimJin‐Hyeok ParkSuk‐Hwan LeeKi‐Ryong Kwon
Van-Nhan TranSeong-Geun KwonSuk‐Hwan LeeHoanh-Su LeKi‐Ryong Kwon
Davide Alessandro CoccominiNicola MessinaClaudio GennaroFabrizio Falchi