SSGVS: Semantic Scene Graph-to-Video Synthesis

Yuren Cong; Jinhui Yi; Bodo Rosenhahn; Michael Ying Yang

doi:10.1109/cvprw59228.2023.00254

ScienceGate Book Chapters

JOURNAL ARTICLE

SSGVS: Semantic Scene Graph-to-Video Synthesis

Yuren Cong Jinhui Yi Bodo Rosenhahn Michael Ying Yang

Year: 2023 Pages: 2555-2565

DOI: 10.1109/cvprw59228.2023.00254

Get Full-Text PDF Get Analytical Report

Abstract

As a natural extension of the image synthesis task, video synthesis has attracted a lot of interest recently. Many image synthesis works utilize class labels or text as guidance. However, neither labels nor text can provide explicit temporal guidance, such as when an action starts or ends. To overcome this limitation, we introduce semantic video scene graphs as input for video synthesis, as they represent the spatial and temporal relationships between objects in the scene. Since video scene graphs are usually temporally discrete annotations, we propose a video scene graph (VSG) encoder that not only encodes the existing video scene graphs but also predicts the graph representations for unlabeled frames. The VSG encoder is pre-trained with different contrastive multi-modal losses. A semantic scene graph-to-video synthesis framework (SSGVS), based on the pre-trained VSG encoder, VQ-VAE, and auto-regressive Transformer, is proposed to synthesize a video given an initial scene image and a non-fixed number of semantic scene graphs. We evaluate SSGVS and other state-of-the-art video synthesis models on the Action Genome dataset and demonstrate the positive significance of video scene graphs in video synthesis. The source code is available at https://github.com/yrcong/SSGVS.

Keywords:

Computer science Encoder Scene graph Artificial intelligence Computer vision Graph Transformer Theoretical computer science

Metrics

Cited By

0.91

FWCI (Field Weighted Citation Impact)

Refs

0.71

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Generative Adversarial Networks and Image Synthesis

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Video Analysis and Summarization

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

SSGVS: Semantic Scene Graph-to-Video Synthesis

Abstract

Metrics

Citation History

Topics

Related Documents

Semantic Fusion Based Graph Network for Video Scene Detection

Constructing Holistic Spatio-Temporal Scene Graph for Video Semantic Role Labeling

Unbiased Video Scene Graph Generation via Visual and Semantic Dual Debiasing

Open-Vocabulary Video Scene Graph Generation via Union-aware Semantic Alignment

Towards Traffic Scene Description: The Semantic Scene Graph