End-to-End Learning of Video Compression using Spatio-Temporal Autoencoders

Jorge Pessoa; Helena Aidos; Pedro Tomás; Mário A. T. Figueiredo

doi:10.1109/sips50750.2020.9195249

ScienceGate Book Chapters

JOURNAL ARTICLE

End-to-End Learning of Video Compression using Spatio-Temporal Autoencoders

Jorge Pessoa Helena Aidos Pedro Tomás Mário A. T. Figueiredo

Year: 2020 Pages: 1-6

DOI: 10.1109/sips50750.2020.9195249

Get Full-Text PDF Get Analytical Report

Abstract

Deep learning (DL) is revolutionizing image and video processing and now holds state-of-the-art performance in many tasks. However, video compression has so far resisted the DL revolution. Current attempts rely on complex solutions, interconnecting multiple networks to mimic the different layers of conventional codecs. Since DL approaches usually excel when the models are allowed to learn their own feature set, a different solution is herein proposed: end-to-end learning of a single network, explicitly avoiding motion estimation/prediction. We formalize it as the rate-distortion optimization of a single spatio-temporal autoencoder, by jointly learning a latent-space projection transform, and a synthesis transform for low-bitrate video compression. The quantizer uses a rounding scheme, relaxed during training, and an entropy estimation technique to enforce an information bottleneck. The obtained video compression network shows competitive performance against standard codecs (MPEG-4 Part 2, H.264/AVC, H.265/HEVC), particularly for low bitrates, even while avoiding the use of any motion prediction/compensation method.

Keywords:

Computer science Codec Autoencoder Artificial intelligence Motion compensation Information bottleneck method Data compression Motion estimation Encoder Computer vision Deep learning Mutual information

Metrics

Cited By

2.81

FWCI (Field Weighted Citation Impact)

Refs

0.91

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Video Coding and Compression Technologies

Physical Sciences → Computer Science → Signal Processing

Advanced Image Processing Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Vision and Imaging

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

End-to-End Learning of Video Compression using Spatio-Temporal Autoencoders

Abstract

Metrics

Citation History

Topics

Related Documents

Learning-Based End-to-End Video Compression with Spatial-Temporal Adaptation

End-to-End Deep Video Compression Based on Hierarchical Temporal Context Learning

Learning-based End-to-End Video Compression Using Predictive Coding

Towards Spatio-temporal Collaborative Learning: An End-to-End Deepfake Video Detection Framework

End-to-End Spatio-Temporal Action Localisation with Video Transformers