Cross-Modal Semantic Matching Generative Adversarial Networks for Text-to-Image Synthesis

Hongchen Tan; Xiuping Liu; Baocai Yin; Xin Li

doi:10.1109/tmm.2021.3060291

ScienceGate Book Chapters

JOURNAL ARTICLE

Cross-Modal Semantic Matching Generative Adversarial Networks for Text-to-Image Synthesis

Hongchen Tan Xiuping Liu Baocai Yin Xin Li

Year: 2021 Journal: IEEE Transactions on Multimedia Vol: 24 Pages: 832-845 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/tmm.2021.3060291

Get Full-Text PDF Get Analytical Report

Abstract

Synthesizing photo-realistic images based on text descriptions is a challenging image generation problem. Although many recent approaches have significantly advanced the performance of text-to-image generation, to guarantee semantic matchings between the text description and synthesized image remains very challenging. In this paper, we propose a new model, Cross-modal Semantic Matching Generative Adversarial Networks (CSM-GAN), to improve the semantic consistency between text description and synthesized image for a fine-grained text-to-image generation. Two new modules are proposed in CSM-GAN: Text Encoder Module (TEM) and Textual-Visual Semantic Matching Module (TVSMM). TVSMM is aimed at making the distance of the pairs of synthesized image and its corresponding text description closer, in global semantic embedding space, than those of mismatched pairs. This improves the semantic consistency and consequently, the generalizability of CSM-GAN. In TEM, we introduce Text Convolutional Neural Networks (Text_CNNs) to capture and highlight local visual features in textual descriptions. Thorough experiments on two public benchmark datasets demonstrated the superiority of CSM-GAN over other representative state-of-the-art methods.

Keywords:

Computer science Consistency (knowledge bases) Artificial intelligence Benchmark (surveying) Convolutional neural network Generative grammar Generalizability theory Image (mathematics) Embedding Semantics (computer science) Generative adversarial network Matching (statistics) Encoder Semantic space Natural language processing Pattern recognition (psychology)

Metrics

Cited By

2.96

FWCI (Field Weighted Citation Impact)

Refs

0.92

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Generative Adversarial Networks and Image Synthesis

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Video Analysis and Summarization

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Cross-Modal Semantic Matching Generative Adversarial Networks for Text-to-Image Synthesis

Abstract

Metrics

Citation History

Topics

Related Documents

Cross-modal Feature Alignment based Hybrid Attentional Generative Adversarial Networks for text-to-image synthesis

Multi-scale dual-modal generative adversarial networks for text-to-image synthesis

Vision-Language Matching for Text-to-Image Synthesis via Generative Adversarial Networks

Bi-affine Semantic Fusion Generative Adversarial Networks for Text-to-Image Synthesis

SF-GAN: Semantic fusion generative adversarial networks for text-to-image synthesis