Fine-Grained Cross-Modal Fusion Based Refinement for Text-to-Image Synthesis

Haoran Sun; Yang Wang; Haipeng Liu; Biao Qian

doi:10.23919/cje.2022.00.227

ScienceGate Book Chapters

JOURNAL ARTICLE

Fine-Grained Cross-Modal Fusion Based Refinement for Text-to-Image Synthesis

Haoran Sun Yang Wang Haipeng Liu Biao Qian

Year: 2023 Journal: Chinese Journal of Electronics Vol: 32 (6)Pages: 1329-1340 Publisher: Institution of Engineering and Technology

DOI: 10.23919/cje.2022.00.227

Get Full-Text PDF Get Analytical Report

Abstract

Text-to-image synthesis refers to generating visual-realistic and semantically consistent images from given textual descriptions.Previous approaches generate an initial low-resolution image and then refine it to be high-resolution.Despite the remarkable progress, these methods are limited in fully utilizing the given texts and could generate text-mismatched images, especially when the text description is complex.We propose a novel finegrained text-image fusion based generative adversarial networks (FF-GAN), which consists of two modules: Finegrained text-image fusion block (FF-Block) and global semantic refinement (GSR).The proposed FF-Block integrates an attention block and several convolution layers to effectively fuse the fine-grained word-context features into the corresponding visual features, in which the text information is fully used to refine the initial image with more details.And the GSR is proposed to improve the global semantic consistency between linguistic and visual features during the refinement process.Extensive experiments on CUB-200 and COCO datasets demonstrate the superiority of FF-GAN over other state-of-the-art approaches in generating images with semantic consistency to the given texts.

Keywords:

Computer science Block (permutation group theory) Consistency (knowledge bases) Image (mathematics) Artificial intelligence Context (archaeology) Convolution (computer science) Fuse (electrical) Natural language processing Pattern recognition (psychology) Word (group theory) Semantics (computer science) Artificial neural network Mathematics

Metrics

Cited By

1.46

FWCI (Field Weighted Citation Impact)

Refs

0.79

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Generative Adversarial Networks and Image Synthesis

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Fine-Grained Cross-Modal Fusion Based Refinement for Text-to-Image Synthesis

Abstract

Metrics

Citation History

Topics

Related Documents

A multilevel fine-grained feature fusion method for cross-modal image–text retrieval

Fine-grained Feature Assisted Cross-modal Image-text Retrieval

Fine-grained Text to Image Synthesis

TECMH: Transformer-Based Cross-Modal Hashing For Fine-Grained Image-Text Retrieval

Paired Cross-Modal Data Augmentation for Fine-Grained Image-to-Text Retrieval