Deepfakes are hyper-realistic videos in which the faces are replaced, swapped, or forged using deep-learning models. This potent media manipulation techniques hold promise for applications across various domains. Yet, they also present a significant risk when employed for malicious intents like iden-tity fraud, phishing, spreading false information, and executing scams. In this work, we propose a novel and improved Deepfake video detector that uses a Convolutional Vision Transformer (CViT2), which builds on the concepts of our previous work (CViT). The CViT architecture consists of two components: a Convolutional Neural Network that extracts learnable features, and a Vision Transformer that categorizes these learned features using an attention mechanism. We trained and evaluted our model on 5 datasets, namely Deepfake Detection Challenge Dataset (DFDC), FaceForensics++ (FF++) I, Celeb-DF v2, Deep-fakeTIMIT, and TrustedMedia. On the test sets unseen during training, we achieved an accuracy of 95 %, 94.8 %, 98.3 % and 76.7% on the DFDC, FF++ , Celeb-DF v2, and TIMIT datasets, respectively. In conclusion, our proposed Deepfake detector can be used in the battle against misinformation and other forensic use cases.
Deressa Wodajo DeressaHannes MareenPeter LambertSolomon AtnafuZahid AkhtarGlenn Van Wallendael
Abhijith Rajeev, P Sreejindeth, Shamnad CP, Rini T Paul, Anu Eldho
Abhijith Rajeev, P Sreejindeth, Shamnad CP, Rini T Paul, Anu Eldho