Human-controllable and structured deep generative models

Tran, Dieu Linh

doi:10.25560/99866

ScienceGate Book Chapters

JOURNAL ARTICLE

Human-controllable and structured deep generative models

Tran, Dieu Linh

Year: 2021 Journal: Spiral (Imperial College London) Publisher: Imperial College London

DOI: 10.25560/99866

Get Full-Text PDF Get Analytical Report

Abstract

Deep generative models are a class of probabilistic models that attempts to learn the underlying data distribution. These models are usually trained in an unsupervised way and thus, do not require any labels. Generative models such as Variational Autoencoders and Generative Adversarial Networks have made astounding progress over the last years. These models have several benefits: eased sampling and evaluation, efficient learning of low-dimensional representations for downstream tasks, and better understanding through interpretable representations. However, even though the quality of these models has improved immensely, the ability to control their style and structure is limited. Structured and human-controllable representations of generative models are essential for human-machine interaction and other applications, including fairness, creativity, and entertainment. This thesis investigates learning human-controllable and structured representations with deep generative models. In particular, we focus on generative modelling of 2D images. For the first part, we focus on learning clustered representations. We propose semi-parametric hierarchical variational autoencoders to estimate the intensity of facial action units. The semi-parametric model forms a hybrid generative-discriminative model and leverages both parametric Variational Autoencoder and non-parametric Gaussian Process autoencoder. We show superior performance in comparison with existing facial action unit estimation approaches. Based on the results and analysis of the learned representation, we focus on learning Mixture-of-Gaussians representations in an autoencoding framework. We deviate from the conventional autoencoding framework and consider a regularized objective with the Cauchy-Schwarz divergence. The Cauchy-Schwarz divergence allows a closed-form solution for Mixture-of-Gaussian distributions and, thus, efficiently optimizing the autoencoding objective. We show that our model outperforms existing Variational Autoencoders in density estimation, clustering, and semi-supervised facial action detection. We focus on learning disentangled representations for conditional generation and fair facial attribute classification for the second part. Conditional image generation relies on the accessibility to large-scale annotated datasets. Nevertheless, the geometry of visual objects, such as in faces, cannot be learned implicitly and deteriorate image fidelity. We propose incorporating facial landmarks with a statistical shape model and a differentiable piecewise affine transformation to separate the representation for appearance and shape. The goal of incorporating facial landmarks is that generation is controlled and can separate different appearances and geometries. In our last work, we use weak supervision for disentangling groups of variations. Works on learning disentangled representation have been done in an unsupervised fashion. However, recent works have shown that learning disentangled representations is not identifiable without any inductive biases. Since then, there has been a shift towards weakly-supervised disentanglement learning. We investigate using regularization based on the Kullback-Leiber divergence to disentangle groups of variations. The goal is to have consistent and separated subspaces for different groups, e.g., for content-style learning. Our evaluation shows increased disentanglement abilities and competitive performance for image clustering and fair facial attribute classification with weak supervision compared to supervised and semi-supervised approaches.

Keywords:

Autoencoder Generative model Generative grammar Focus (optics) Probabilistic logic Divergence (linguistics) Deep learning Mixture model Artificial neural network Feature learning

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.25

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Generative Adversarial Networks and Image Synthesis

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Face recognition and analysis

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Emotion and Mood Recognition

Social Sciences → Psychology → Experimental and Cognitive Psychology

Human-controllable and structured deep generative models

Abstract

Metrics

Topics

Related Documents

Structured Generative Models for Controllable Scene and 3D Content Synthesis

A mortized inference for structured deep generative models

Towards controllable generative models

Structured visual understanding and generation with deep generative models

Human Interpretable Radar Through Deep Generative Models