JOURNAL ARTICLE

Multi-modal Deepfake Detection via Multi-task Audio-Visual Prompt Learning

Hui MiaoYuanfang GuoZeming LiuYunhong Wang

Year: 2025 Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Vol: 39 (1)Pages: 612-621   Publisher: Association for the Advancement of Artificial Intelligence

Abstract

With the malicious use and dissemination of multi-modal deepfake videos, researchers start to investigate multi-modal deepfake detection. Unfortunately, most of the existing methods tune all the parameters of the deep network with limited speech video datasets and are trained under coarse-grained consistency supervision, which hinders their generalization ability in practical scenarios. To solve these problems, in this paper, we propose the first multi-task audio-visual prompt learning method for multi-modal deepfake video detection, by exploiting multiple foundation models. Specifically, we construct a two-stream multi-task learning architecture and propose sequential visual prompts and short-time audio prompts to extract multi-modal features, which are aligned at the frame level and utilized in subsequent fine-grained feature matching and fusion. Due to the natural alignment of visual content and audio signal in real data, we propose a frame-level cross-modal feature matching loss function to learn the fine-grained audio-visual consistency. Comprehensive experiments demonstrate the effectiveness and superior generalization ability of our method against the state-of-the-art methods.

Keywords:
Computer science Task (project management) Modal Audio visual Artificial intelligence Speech recognition Natural language processing Human–computer interaction Multimedia Engineering Chemistry

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
42
Refs
0.36
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Digital Media Forensic Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Anomaly Detection Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
Generative Adversarial Networks and Image Synthesis
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.