JOURNAL ARTICLE

Fine-Grained Cross-Modal Fusion Based Refinement for Text-to-Image Synthesis

Haoran SunYang WangHaipeng LiuBiao Qian

Year: 2023 Journal:   Chinese Journal of Electronics Vol: 32 (6)Pages: 1329-1340   Publisher: Institution of Engineering and Technology

Abstract

Text-to-image synthesis refers to generating visual-realistic and semantically consistent images from given textual descriptions.Previous approaches generate an initial low-resolution image and then refine it to be high-resolution.Despite the remarkable progress, these methods are limited in fully utilizing the given texts and could generate text-mismatched images, especially when the text description is complex.We propose a novel finegrained text-image fusion based generative adversarial networks (FF-GAN), which consists of two modules: Finegrained text-image fusion block (FF-Block) and global semantic refinement (GSR).The proposed FF-Block integrates an attention block and several convolution layers to effectively fuse the fine-grained word-context features into the corresponding visual features, in which the text information is fully used to refine the initial image with more details.And the GSR is proposed to improve the global semantic consistency between linguistic and visual features during the refinement process.Extensive experiments on CUB-200 and COCO datasets demonstrate the superiority of FF-GAN over other state-of-the-art approaches in generating images with semantic consistency to the given texts.

Keywords:
Computer science Block (permutation group theory) Consistency (knowledge bases) Image (mathematics) Artificial intelligence Context (archaeology) Convolution (computer science) Fuse (electrical) Natural language processing Pattern recognition (psychology) Word (group theory) Semantics (computer science) Artificial neural network Mathematics

Metrics

8
Cited By
1.46
FWCI (Field Weighted Citation Impact)
54
Refs
0.79
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Generative Adversarial Networks and Image Synthesis
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

BOOK-CHAPTER

Fine-grained Text to Image Synthesis

Xu OuyangYing ChenKaiyue ZhuGady Agam

Lecture notes in computer science Year: 2024 Pages: 359-373
JOURNAL ARTICLE

TECMH: Transformer-Based Cross-Modal Hashing For Fine-Grained Image-Text Retrieval

Qiqi LiLongfei MaZheng JiangMingyong LiBo Jin

Journal:   Computers, materials & continua/Computers, materials & continua (Print) Year: 2023 Vol: 75 (2)Pages: 3713-3728
JOURNAL ARTICLE

Paired Cross-Modal Data Augmentation for Fine-Grained Image-to-Text Retrieval

Hao WangGuosheng LinSteven C. H. HoiChunyan Miao

Journal:   Proceedings of the 30th ACM International Conference on Multimedia Year: 2022 Pages: 5517-5526
© 2026 ScienceGate Book Chapters — All rights reserved.