Towards Spatio-temporal Collaborative Learning: An End-to-End Deepfake Video Detection Framework

Wenxuan Guo; Shuo Du; Huiyuan Deng; Zikang Yu; Lin Feng

doi:10.1109/ijcnn54540.2023.10191479

ScienceGate Book Chapters

JOURNAL ARTICLE

Towards Spatio-temporal Collaborative Learning: An End-to-End Deepfake Video Detection Framework

Wenxuan Guo Shuo Du Huiyuan Deng Zikang Yu Lin Feng

Year: 2023 Pages: 1-8

DOI: 10.1109/ijcnn54540.2023.10191479

Get Full-Text PDF Get Analytical Report

Abstract

With the rapid development of facial tampering techniques, the deepfake detection task has attracted widespread social concerns. Most existing video-based methods adopt temporal convolution to learn temporal discontinuities directly, where they might neglect to explore both local detail mutation and inconsistent global expression semantics in the temporal dimension. This makes it difficult to learn more discriminative forgery cues. To mitigate this issue, we introduce a novel deepfake video detection framework specifically designed to capture fine-grained traces of tampering. Concretely, we first present a Multi-layered Feature Extraction module (MFE) that constructs comprehensive spatio-temporal representations by stitching different levels of features together. Afterward, we propose a Bidirectional temporal Artifact Enhancement module (BAE), which exploits local differences between adjacent frames to enhance frame-level features. Moreover, we present a Cross temporal Stride Aggregation strategy (CSA) to mine inconsistent global semantics and adaptively obtain multi-timescale representations. Extensive experiments on several benchmarks demonstrate that the proposed method outperforms state-of-the-art performance compared to other competitive approaches.

Keywords:

Computer science Semantics (computer science) Artificial intelligence Image stitching Feature extraction Feature (linguistics) Convolution (computer science) Discriminative model Machine learning Pattern recognition (psychology) Artificial neural network

Metrics

Cited By

0.36

FWCI (Field Weighted Citation Impact)

Refs

0.53

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Digital Media Forensic Detection

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Generative Adversarial Networks and Image Synthesis

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Anomaly Detection Techniques and Applications

Physical Sciences → Computer Science → Artificial Intelligence

Towards Spatio-temporal Collaborative Learning: An End-to-End Deepfake Video Detection Framework

Abstract

Metrics

Citation History

Topics

Related Documents

SFormer: An end-to-end spatio-temporal transformer architecture for deepfake detection

End-to-end Multi-task Learning Framework for Spatio-Temporal Grounding in Video Corpus

VSTRD: An end-to-end video spatio-temporal relation detection transformer

End-to-End Learning of Video Compression using Spatio-Temporal Autoencoders

STCA-net: spatio-temporal collaborative attention network for deepfake video detection