Abhijith Rajeev, P Sreejindeth, Shamnad CP, Rini T Paul, Anu Eldho
Real-time framework uses the Video Vision Transformer (ViViT) to address the growing threat of deceptivemultimedia content. The proposed model extracts temporaland spatial features from video frames, effectively discerningauthentic content from manipulated deep fake videos. Trainedon a diverse dataset, the transformer-based architecture captureslong-range dependencies and contextual information crucial foraccurate detection. To optimize real-time applicability, the systemincorporates efficient attention mechanisms and parallel processing techniques during inference. The model exhibits robustperformance against various deep fake generation techniques,including face swaps and lip-sync manipulations. Evaluationresults showcase high accuracy, precision, and recall, emphasizingthe system’s efficacy. The significance of this work lies inits contribution to countering the malicious use of deep faketechnology, offering a reliable and efficient real-time detectionmechanism. The proposed approach aims to protect the integrityof multimedia content across diverse applications, such as socialmedia and journalism. Overall, the system provides a practicalsolution to the pressing challenges associated with deep fakeproliferation in the digital landscape.
Abhijith Rajeev, P Sreejindeth, Shamnad CP, Rini T Paul, Anu Eldho
Deressa Wodajo DeressaHannes MareenPeter LambertSolomon AtnafuZahid AkhtarGlenn Van Wallendael