JOURNAL ARTICLE

End-to-End Learning of Video Compression using Spatio-Temporal Autoencoders

Abstract

Deep learning (DL) is revolutionizing image and video processing and now holds state-of-the-art performance in many tasks. However, video compression has so far resisted the DL revolution. Current attempts rely on complex solutions, interconnecting multiple networks to mimic the different layers of conventional codecs. Since DL approaches usually excel when the models are allowed to learn their own feature set, a different solution is herein proposed: end-to-end learning of a single network, explicitly avoiding motion estimation/prediction. We formalize it as the rate-distortion optimization of a single spatio-temporal autoencoder, by jointly learning a latent-space projection transform, and a synthesis transform for low-bitrate video compression. The quantizer uses a rounding scheme, relaxed during training, and an entropy estimation technique to enforce an information bottleneck. The obtained video compression network shows competitive performance against standard codecs (MPEG-4 Part 2, H.264/AVC, H.265/HEVC), particularly for low bitrates, even while avoiding the use of any motion prediction/compensation method.

Keywords:
Computer science Codec Autoencoder Artificial intelligence Motion compensation Information bottleneck method Data compression Motion estimation Encoder Computer vision Deep learning Mutual information

Metrics

32
Cited By
2.81
FWCI (Field Weighted Citation Impact)
32
Refs
0.91
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Video Coding and Compression Technologies
Physical Sciences →  Computer Science →  Signal Processing
Advanced Image Processing Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Vision and Imaging
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.