Video spatio-temporal generative adversarial network for local action generation

Xuejun Liu; Jiacheng Guo; Zhongji Cui; Ling Liu; Yong Yan; Yun Sha

doi:10.1117/1.jei.32.5.053003

ScienceGate Book Chapters

JOURNAL ARTICLE

Video spatio-temporal generative adversarial network for local action generation

Xuejun Liu Jiacheng Guo Zhongji Cui Ling Liu Yong Yan Yun Sha

Year: 2023 Journal: Journal of Electronic Imaging Vol: 32 (05) Publisher: SPIE

DOI: 10.1117/1.jei.32.5.053003

Get Full-Text PDF Get Analytical Report

Abstract

Generating action videos in future scenes based on static images can make computer vision systems to be better applied for video understanding and intelligent decision-making. However, current models pay more attention to the motion trend of the generated objects, and the processing effect on local details is not ideal. The local features of the generated video will have the problem of blurred frames and incoherent motion. This paper proposes a two-stage model, video spatio-temporal generative adversarial network (VSTGAN), which consists of two GAN networks, such as temporal network and spatial network (S-net). The model fully combines the advantages of CNNs, recurrent neural networks (RNNs), and GANs to decompose the complex spatiotemporal generation problem into temporal and spatial dimensions. Therefore, VSTGAN can focus on local features from the above dimensions respectively. In the temporal dimension, we propose an RNN unit, the convolutional attention unit (ConvAU), which uses the convolutional attention module to dynamically generate weights to update the hidden state. Thus, T-net uses the ConvAU to generate local dynamics. In the spatial dimension, S-net uses CNNs and attention modules to perform resolution reconstruction of the generated local dynamics for video generation. We build two small-sample datasets and validate our approach on these two new datasets and the KTH public dataset. The results show that our approach can effectively generate local details in future action videos and that the model performance on small-sample datasets is competitive with the state-of-the-art in video generation.

Keywords:

Computer science Artificial intelligence Convolutional neural network Recurrent neural network Dimension (graph theory) Generative model Sample (material) Pattern recognition (psychology) Generative grammar Computer vision Machine learning Artificial neural network

Metrics

Cited By

0.18

FWCI (Field Weighted Citation Impact)

Refs

0.42

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Generative Adversarial Networks and Image Synthesis

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Human Motion and Animation

Physical Sciences → Engineering → Control and Systems Engineering

Video spatio-temporal generative adversarial network for local action generation

Abstract

Metrics

Citation History

Topics

Related Documents

Bidirectional spatio-temporal generative adversarial network for video super-resolution

STemGAN: spatio-temporal generative adversarial network for video anomaly detection

Spatio-temporal generative adversarial network for gait anonymization

Spatio-Temporal Learning for Video Deblurring based on Two-Stream Generative Adversarial Network

Spatio‐Temporal Generative Adversarial Networks