JOURNAL ARTICLE

Towards Spatio-temporal Collaborative Learning: An End-to-End Deepfake Video Detection Framework

Abstract

With the rapid development of facial tampering techniques, the deepfake detection task has attracted widespread social concerns. Most existing video-based methods adopt temporal convolution to learn temporal discontinuities directly, where they might neglect to explore both local detail mutation and inconsistent global expression semantics in the temporal dimension. This makes it difficult to learn more discriminative forgery cues. To mitigate this issue, we introduce a novel deepfake video detection framework specifically designed to capture fine-grained traces of tampering. Concretely, we first present a Multi-layered Feature Extraction module (MFE) that constructs comprehensive spatio-temporal representations by stitching different levels of features together. Afterward, we propose a Bidirectional temporal Artifact Enhancement module (BAE), which exploits local differences between adjacent frames to enhance frame-level features. Moreover, we present a Cross temporal Stride Aggregation strategy (CSA) to mine inconsistent global semantics and adaptively obtain multi-timescale representations. Extensive experiments on several benchmarks demonstrate that the proposed method outperforms state-of-the-art performance compared to other competitive approaches.

Keywords:
Computer science Semantics (computer science) Artificial intelligence Image stitching Feature extraction Feature (linguistics) Convolution (computer science) Discriminative model Machine learning Pattern recognition (psychology) Artificial neural network

Metrics

2
Cited By
0.36
FWCI (Field Weighted Citation Impact)
30
Refs
0.53
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Digital Media Forensic Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Generative Adversarial Networks and Image Synthesis
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Anomaly Detection Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

SFormer: An end-to-end spatio-temporal transformer architecture for deepfake detection

Staffy KingraNaveen AggarwalNirmal Kaur

Journal:   Forensic Science International Digital Investigation Year: 2024 Vol: 51 Pages: 301817-301817
JOURNAL ARTICLE

End-to-end Multi-task Learning Framework for Spatio-Temporal Grounding in Video Corpus

Yingqi GaoZhiling LuoShiqian ChenWei Zhou

Journal:   Proceedings of the 31st ACM International Conference on Information & Knowledge Management Year: 2022 Pages: 3958-3962
JOURNAL ARTICLE

STCA-net: spatio-temporal collaborative attention network for deepfake video detection

Jianping LiJing SunYanyi MengKang Xu

Journal:   Engineering Research Express Year: 2025 Vol: 7 (3)Pages: 035286-035286
© 2026 ScienceGate Book Chapters — All rights reserved.