Synthesizing high-resolution remote sensing images based on the given text descriptions has great potential in expanding the image data set to release the power of deep learning in the remote sensing image processing field. However, there has been no efficient research carried out on this formidable task yet. Given a remote sensing image, the structural rationality of ground objects is critical to judge it whether real or fake, e.g., real bridges are always straight, while a sinuous one can be easily judged as fake. Inspired by this, we propose a multistage structured generative adversarial network (StrucGAN) to synthesize remote sensing images in a structured way given the text descriptions. StrucGAN utilizes structural information extracted by an unsupervised segmentation module to enable the discriminators to distinguish the image in a structured way. The generators of StrucGAN are, thus, forced to synthesize structural reasonable image contents, which could enhance the image authenticity. The multistage framework enables the StrucGAN to generate remote sensing images with increasing resolution stage by stage. The quantitative and qualitative experiments' results show that the proposed StrucGAN achieves better performance compared with the baseline, and it could synthesize high resolution, realistic, structural reasonable remote sensing images that are semantically consistent with the given text descriptions.
Andrey KuznetsovМ. В. Гашников
Xingzhe SuDaixi JiaFengge WuJunsuo ZhaoChangwen Zheng
Rui XueYang CaoXin YuanYu KangWeiguo Song
Henning SchulzeDoğucan YamanAlexander Waibel