JOURNAL ARTICLE

Structure-Aware Generative Adversarial Network for Text-to-Image Generation

Abstract

Text-to-image generation aims at synthesizing photo-realistic images from textual descriptions. Existing methods typically align images with the corresponding texts in a joint semantic space. However, the presence of the modality gap in the joint semantic space leads to misalignment. Meanwhile, the limited receptive field of the convolutional neural network leads to structural distortions of generated images. In this work, a structure-aware generative adversarial network (SaGAN) is proposed for (1) semantically aligning multimodal features in the joint semantic space in a learnable manner; and (2) improving the structure and contour of generated images by the designed content-invariant negative samples. Experimental results show that SaGAN achieves over 30.1% and 8.2% improvements in terms of FID on the datasets of CUB and COCO when compared with the state-of-the-art approaches.

Keywords:
Semantic space Computer science Artificial intelligence Generative adversarial network Convolutional neural network Generative grammar Joint (building) Adversarial system Invariant (physics) Image (mathematics) Pattern recognition (psychology) Semantic gap Computer vision Image retrieval Mathematics

Metrics

1
Cited By
0.18
FWCI (Field Weighted Citation Impact)
22
Refs
0.42
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Generative Adversarial Networks and Image Synthesis
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Computer Graphics and Visualization Techniques
Physical Sciences →  Computer Science →  Computer Graphics and Computer-Aided Design
Digital Media Forensic Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.