In recent years, deep learning has made a lot of impacts and achievements to the computer vision community. Nowadays, deep learning model can recognize thousands of image categories, with various architectures, deeper and deeper. In complex scene, deep neural models can localize objects and detect a number of object categories and perform instance segmentation afterward. At most recently, a number of scene graph generation and visual relationship detection methods are developed for high-level image understanding, in order to extract more fine-grained and structural representation from images. As a dual problem of visual understanding, visual generation also attracts lots of attention during these few years in the light of deep learning techniques. Deep generative models can generate realistic images with high resolution and high quality, and also be further applied to make image translation across different domains and environments. The world around us is highly structured and images are highly structured. Images can not only contain multiple foreground object categories but also contain various background either in natural scenes or artificial scenarios. In this thesis, we mainly leverage structure information for visual generation and understanding in these tasks: 1) leveraging the semantic structure to generate realistic images
Minsuk KahngNikhil ThoratDuen Horng ChauFernanda ViégasMartin Wattenberg
Suman RavuriMélanie ReyShakir MohamedMarc Peter Deisenroth