JOURNAL ARTICLE

Disentangled Representation Learning for Controllable Image Synthesis: An Information-Theoretic Perspective

Abstract

In this paper, we look into the problem of disentangled representation learning and controllable image synthesis in a deep generative model. We develop an encoder-decoder architecture for a variant of the Variational Auto-Encoder (VAE) with two latent codes z 1 and z 2 . Our framework uses z 2 to capture specified factors of variation while z 1 captures the complementary factors of variation. To this end, we analyze the learning problem from the perspective of multivariate mutual information, derive optimizable lower bounds of the conditional mutual information in the image synthesis processes and incorporate them into the training objective. We validate our method empirically on the Color MNIST dataset and the CelebA dataset by showing controllable image syntheses. Our proposed paradigm is simple yet effective and is applicable to many situations, including those where there is not an explicit factorization of features available, or where the features are non-categorical.

Keywords:
MNIST database Computer science Artificial intelligence Representation (politics) Perspective (graphical) Factorization Encoder Image (mathematics) Theoretical computer science Deep learning Machine learning Algorithm

Metrics

3
Cited By
0.31
FWCI (Field Weighted Citation Impact)
80
Refs
0.54
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Generative Adversarial Networks and Image Synthesis
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Digital Media Forensic Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Handwritten Text Recognition Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.