Image generation is an intriguing research topic. Text conditioned image generation is a specific problem under the image generation research topic. Text controlled image generation requires understanding the linguistic semantics of the text and accurately mapping them with the visual semantics, which can be a hard task to achieve. This work also aims to achieve the same for generating and manipulating human face images through text descriptions using StyleGAN. In the proposed architecture there are mainly two pipelines, one for text based image generation and another for text based image manipulation. Each pipeline contains a sequence of models that achieve their respective tasks. For text based image generation, a Text Encoder and Latent Code Decoder are used to map the text to the latent space of a pre-trained StyleGAN. For text based image manipulation, GAN Inversion technique is used to map the real world image to the latent space of pre-trained StyleGAN and obtain the latent vector. Latent directions are learned in the disentangled latent space of pre-trained StyleGAN model and are used for image manipulation. The target attribute is identified by applying Latent Direction Classifier on the text input and its corresponding latent direction is used in manipulating the latent code of the original image. The final manipulated image is generated by using the modified latent code in the StyleGAN generator.
Takato YoshikawaYuki EndoYoshihiro Kanamori
Gabriel HermosillaDiego-Ignacio Henriquez TapiaHéctor Allende‐CidGonzalo FaríasEsteban Vera
Weihao XiaYujiu YangJing‐Hao XueBaoyuan Wu