JOURNAL ARTICLE

Self-Supervised Video Forensics by Audio-Visual Anomaly Detection

Abstract

Manipulated videos often contain subtle inconsistencies between their visual and audio signals. We propose a video forensics method, based on anomaly detection, that can identify these inconsistencies, and that can be trained solely using real, unlabeled data. We train an autoregressive model to generate sequences of audio-visual features, using feature sets that capture the temporal synchronization between video frames and sound. At test time, we then flag videos that the model assigns low probability. Despite being trained entirely on real videos, our model obtains strong performance on the task of detecting manipulated speech videos. Project site: https://cfeng16.github.io/audio-visual-forensics.

Keywords:
Computer science Anomaly detection Artificial intelligence Feature (linguistics) Autoregressive model Audio visual Task (project management) Visualization Speech recognition Computer vision Pattern recognition (psychology) Multimedia

Metrics

54
Cited By
9.83
FWCI (Field Weighted Citation Impact)
155
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Digital Media Forensic Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Anomaly Detection Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.