Image synthesis is an important problem in computer vision and has many applications, such as computer-aided design and photo-editing. There has been remarkable progress in this direction with the emergence of Generative Adversarial Networks (GANs). However, GANs still face many challenges in generating high quality images: the difficulty of directly approximating the high-resolution image distribution, the poor model generalization ability to datasets with multiple classes, the frequent occurrences of mode collapse and unstable training are among the key challenges. To tackle those challenges, we conduct extensive studies on designing new network architectures, adding regularization, introducing heuristic tricks, and modifying the learning objectives and dynamics. (i) New Stacked Generative Adversarial Networks (StackGANs) are proposed for high-resolution images synthesis. The StackGAN-v1 is first built to decompose the hard image generation problem into more manageable sub-problems through a sketch-refinement process, generating unprecedented 256256 photo-realistic images from text descriptions. Moreover, a novel Conditioning Augmentation technique, that encourages smoothness in the latent conditioning manifold, is introduced to improve the diversity of the synthesized images and stabilize the training of the conditional-GAN. To further improve the quality of generated samples and stabilize GANs’ training, an advanced multi-stage generativeadversarial network architecture, StackGAN-v2, is presented for both conditional and unconditional generative tasks. (ii) A novel Self-Attention Generative Adversarial Networks (SAGAN) is introduced for multi-class image generation. Our SAGAN incorporates the self-attention mechanism into the convolutional GAN framework, so that it can model long-range multi-level dependencies for generating realistic images on challenging datasets, such as ImageNet. Moreover, we show that the spectral normalization applied to the generator can stabilize GANs’ training and the TTUR can speed up training of regularized discriminators. (iii) We present the Optimal Transport Generative Adversarial Networks (OT-GAN), a variant of GANs minimizing a new metric measuring the distance between the generator distribution and the data distribution. This metric, called mini-batch energy distance, combines optimal transport in primal form with an energy distance defined in an adversarially learned feature space, resulting in a highly discriminative distance function with unbiased mini-batch gradients. Both qualitative and quantitative validation experiments are conducted for all proposed methods.
Kunal Kumar AhujaEkta GoyalShikha SatsangiC. Patvardhan
Somdatta PatraHimanshu DahiyaShubhrika SharmaAakash Singh
Yang LeiRichard L. J. QiuTonghe WangWalter J. CurranTian LiuXiaofeng Yang
Melanie SchellenbergJanek GröhlKris K. DreherJan-Hinrich NölkeNiklas HolzwarthMinu D. TizabiAlexander SeitelLena Maier‐Hein
Vishal RanerAmit JoshiSuraj Sawant