JOURNAL ARTICLE

Generative adversarial networks for pose-guided human video generation

Abstract

Generation of realistic high-resolution videos of human subjects is a challenging and important task in computer vision. In this thesis, we focus on human motion transfer -- generation of a video depicting a particular subject, observed in a single image, performing a series of motions exemplified by an auxiliary (driving) video. Our GAN-based architecture, DwNet, leverages a dense intermediate pose-guided representation and a refinement process to warp the required subject appearance, in the form of the texture, from a source image into a desired pose. Temporal consistency is maintained by further conditioning the decoding process within a GAN on the previously generated frame. In this way a video is generated in an iterative and recurrent fashion. We illustrate the efficacy of our approach by showing state-of-the-art quantitative and qualitative performance on two benchmark datasets: TaiChi and Fashion Modeling. The latter was collected by us and is made publicly available to the community. We also show how our proposed method can be further improved by using a recent segmentation-mask-based architecture, such as SPADE, and how to battle temporal inconsistency in video synthesis using a temporal discriminator.

Keywords:
Adversarial system Computer science Artificial intelligence Generative grammar Generative adversarial network Computer vision Deep learning

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.03
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Advanced Vision and Imaging
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Generative Adversarial Networks and Image Synthesis
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.