JOURNAL ARTICLE

Human-controllable and structured deep generative models

Tran, Dieu Linh

Year: 2021 Journal:   Spiral (Imperial College London)   Publisher: Imperial College London

Abstract

Deep generative models are a class of probabilistic models that attempts to learn the underlying data distribution. These models are usually trained in an unsupervised way and thus, do not require any labels. Generative models such as Variational Autoencoders and Generative Adversarial Networks have made astounding progress over the last years. These models have several benefits: eased sampling and evaluation, efficient learning of low-dimensional representations for downstream tasks, and better understanding through interpretable representations. However, even though the quality of these models has improved immensely, the ability to control their style and structure is limited. Structured and human-controllable representations of generative models are essential for human-machine interaction and other applications, including fairness, creativity, and entertainment. This thesis investigates learning human-controllable and structured representations with deep generative models. In particular, we focus on generative modelling of 2D images. For the first part, we focus on learning clustered representations. We propose semi-parametric hierarchical variational autoencoders to estimate the intensity of facial action units. The semi-parametric model forms a hybrid generative-discriminative model and leverages both parametric Variational Autoencoder and non-parametric Gaussian Process autoencoder. We show superior performance in comparison with existing facial action unit estimation approaches. Based on the results and analysis of the learned representation, we focus on learning Mixture-of-Gaussians representations in an autoencoding framework. We deviate from the conventional autoencoding framework and consider a regularized objective with the Cauchy-Schwarz divergence. The Cauchy-Schwarz divergence allows a closed-form solution for Mixture-of-Gaussian distributions and, thus, efficiently optimizing the autoencoding objective. We show that our model outperforms existing Variational Autoencoders in density estimation, clustering, and semi-supervised facial action detection. We focus on learning disentangled representations for conditional generation and fair facial attribute classification for the second part. Conditional image generation relies on the accessibility to large-scale annotated datasets. Nevertheless, the geometry of visual objects, such as in faces, cannot be learned implicitly and deteriorate image fidelity. We propose incorporating facial landmarks with a statistical shape model and a differentiable piecewise affine transformation to separate the representation for appearance and shape. The goal of incorporating facial landmarks is that generation is controlled and can separate different appearances and geometries. In our last work, we use weak supervision for disentangling groups of variations. Works on learning disentangled representation have been done in an unsupervised fashion. However, recent works have shown that learning disentangled representations is not identifiable without any inductive biases. Since then, there has been a shift towards weakly-supervised disentanglement learning. We investigate using regularization based on the Kullback-Leiber divergence to disentangle groups of variations. The goal is to have consistent and separated subspaces for different groups, e.g., for content-style learning. Our evaluation shows increased disentanglement abilities and competitive performance for image clustering and fair facial attribute classification with weak supervision compared to supervised and semi-supervised approaches.

Keywords:
Autoencoder Generative model Generative grammar Focus (optics) Probabilistic logic Divergence (linguistics) Deep learning Mixture model Artificial neural network Feature learning

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.25
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Generative Adversarial Networks and Image Synthesis
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Face recognition and analysis
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology

Related Documents

JOURNAL ARTICLE

Structured Generative Models for Controllable Scene and 3D Content Synthesis

Pavllo, Dario

Journal:   Repository for Publications and Research Data (ETH Zurich) Year: 2023
JOURNAL ARTICLE

Towards controllable generative models

Han, Ligong

Journal:   Rutgers University Community Repository (Rutgers University) Year: 2024
DISSERTATION

Structured visual understanding and generation with deep generative models

Song, Yuhang (author)

University:   University of Southern California Digital Library Year: 2020
JOURNAL ARTICLE

Human Interpretable Radar Through Deep Generative Models

Nir DvoreckiYuval AmizurLeor Banin

Journal:   2022 19th European Radar Conference (EuRAD) Year: 2022 Pages: 389-392
© 2026 ScienceGate Book Chapters — All rights reserved.