Disentangled Representation Learning for Controllable Person Image Generation

Wenju Xu; Chengjiang Long; Yongwei Nie; Guanghui Wang

doi:10.1109/tmm.2023.3345180

ScienceGate Book Chapters

JOURNAL ARTICLE

Disentangled Representation Learning for Controllable Person Image Generation

Wenju Xu Chengjiang Long Yongwei Nie Guanghui Wang

Year: 2024 Journal: IEEE Transactions on Multimedia Vol: 26 Pages: 6065-6077 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/tmm.2023.3345180

Get Full-Text PDF Get Analytical Report

Abstract

In this paper, we propose a novel framework named DRL-CPG to learn disentangled latent representation for controllable person image generation, which can produce realistic person images with desired poses and human attributes (e.g. pose, head, upper clothes, and pants) provided by various source persons. Unlike the existing works leveraging the semantic masks to obtain the representation of each component, we propose to generate disentangled latent code via a novel attribute encoder with transformers trained in a manner of curriculum learning from a relatively easy step to a gradually hard one. A random component mask-agnostic strategy is introduced to randomly remove component masks from the person segmentation masks, which aims at increasing the difficulty of training and promoting the transformer encoder to recognize the underlying boundaries between each component. This enables the model to transfer both the shape and texture of the components. Furthermore, we propose a novel attribute decoder network to integrate multi-level attributes (e.g. the structure feature and the attribute representation) with well-designed Dual Adaptive Denormalization (DAD) residual blocks. Extensive experiments strongly demonstrate that the proposed approach is able to transfer both the texture and shape of different human parts and yield realistic results. To our knowledge, we are the first to learn disentangled latent representations with transformers for person image generation.

Keywords:

Computer science Feature learning Encoder Artificial intelligence Transformer Component (thermodynamics) Segmentation Pattern recognition (psychology) Representation (politics) Computer vision Machine learning

Metrics

Cited By

3.71

FWCI (Field Weighted Citation Impact)

Refs

0.87

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Video Surveillance and Tracking Methods

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Generative Adversarial Networks and Image Synthesis

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Disentangled Representation Learning for Controllable Person Image Generation

Abstract

Metrics

Citation History

Topics

Related Documents

Learning Disentangled Equivariant Representation for Explicitly Controllable 3D Molecule Generation

Disentangled Representation Learning for Controllable Image Synthesis: An Information-Theoretic Perspective

Disentangled and Controllable Face Image Generation via 3D Imitative-Contrastive Learning

Controllable image generation based on causal representation learning

Learning Disentangled Representation for Robust Person Re-identification