DFDT: An End-to-End DeepFake Detection Framework Using Vision Transformer

Aminollah Khormali; J.S. Yuan

doi:10.3390/app12062953

ScienceGate Book Chapters

JOURNAL ARTICLE

DFDT: An End-to-End DeepFake Detection Framework Using Vision Transformer

Aminollah Khormali J.S. Yuan

Year: 2022 Journal: Applied Sciences Vol: 12 (6)Pages: 2953-2953 Publisher: Multidisciplinary Digital Publishing Institute

DOI: 10.3390/app12062953

Get Full-Text PDF Get Analytical Report

Abstract

The ever-growing threat of deepfakes and large-scale societal implications has propelled the development of deepfake forensics to ascertain the trustworthiness of digital media. A common theme of existing detection methods is using Convolutional Neural Networks (CNNs) as a backbone. While CNNs have demonstrated decent performance on learning local discriminative information, they fail to learn relative spatial features and lose important information due to constrained receptive fields. Motivated by the aforementioned challenges, this work presents DFDT, an end-to-end deepfake detection framework that leverages the unique characteristics of transformer models, for learning hidden traces of perturbations from both local image features and global relationship of pixels at different forgery scales. DFDT is specifically designed for deepfake detection tasks consisting of four main components: patch extraction & embedding, multi-stream transformer block, attention-based patch selection followed by a multi-scale classifier. DFDT’s transformer layer benefits from a re-attention mechanism instead of a traditional multi-head self-attention layer. To evaluate the performance of DFDT, a comprehensive set of experiments are conducted on several deepfake forensics benchmarks. Obtained results demonstrated the surpassing detection rate of DFDT, achieving 99.41%, 99.31%, and 81.35% on FaceForensics++, Celeb-DF (V2), and WildDeepfake, respectively. Moreover, DFDT’s excellent cross-dataset & cross-manipulation generalization provides additional strong evidence on its effectiveness.

Keywords:

Computer science Artificial intelligence Transformer Convolutional neural network Machine learning Discriminative model End-to-end principle Pattern recognition (psychology)

Metrics

Cited By

7.92

FWCI (Field Weighted Citation Impact)

Refs

0.98

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Digital Media Forensic Detection

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Generative Adversarial Networks and Image Synthesis

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Anomaly Detection Techniques and Applications

Physical Sciences → Computer Science → Artificial Intelligence

DFDT: An End-to-End DeepFake Detection Framework Using Vision Transformer

Abstract

Metrics

Citation History

Topics

Related Documents

SFormer: An end-to-end spatio-temporal transformer architecture for deepfake detection

DeepFake Video Detection using Vision Transformer

DFU-Ens: End-to-End Diabetic Foot Ulcer Segmentation Framework with Vision Transformer Based Detection

An End-to-End Vision Transformer Approach for Image Copy Detection

Deepfake Voice Detection: An Approach Using End-to-End Transformer with Acoustic Feature Fusion by Cross-Attention