Uncovering Limitations in Text-to-Image Generation: A Contrastive Approach with Structured Semantic Alignment

Qianyu Feng; Yulei Sui; Hongyu Zhang

doi:10.18653/v1/2023.findings-emnlp.595

ScienceGate Book Chapters

JOURNAL ARTICLE

Uncovering Limitations in Text-to-Image Generation: A Contrastive Approach with Structured Semantic Alignment

Qianyu Feng Yulei Sui Hongyu Zhang

Year: 2023 Pages: 8876-8888

DOI: 10.18653/v1/2023.findings-emnlp.595

Get Full-Text PDF Get Analytical Report

Abstract

Despite significant advancements in text-to-image generation models, they still face challenges when it comes to producing highly detailed or complex images based on textual descriptions. In order to explore these limitations, we propose a Structured Semantic Alignment (SSA) method for evaluating text-to-image generation models. SSA focuses on learning structured semantic embeddings across different modalities and aligning them in a joint space. The method employs the following steps to achieve its objective: (i) Generating mutated prompts by substituting words with semantically equivalent or nonequivalent alternatives while preserving the original syntax; (ii) Representing the sentence structure through parsing trees obtained via syntax parsing; (iii) Learning fine-grained structured embeddings that project semantic features from different modalities into a shared embedding space; (iv) Evaluating the semantic consistency between the structured text embeddings and the corresponding visual embeddings. Through experiments conducted on various benchmarks, we have demonstrated that SSA offers improved measurement of semantic consistency of text-to-image generation models. Additionally, it unveils a wide range of generation errors including under-generation, incorrect constituency, incorrect dependency, and semantic confusion. By uncovering these biases and limitations embedded within the models, our proposed method provides valuable insights into their shortcomings when applied to real-world scenarios.

Keywords:

Computer science Parsing Natural language processing Artificial intelligence Consistency (knowledge bases) Embedding Syntax Semantics (computer science) Information retrieval Programming language

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.17

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Generative Adversarial Networks and Image Synthesis

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Video Analysis and Summarization

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Uncovering Limitations in Text-to-Image Generation: A Contrastive Approach with Structured Semantic Alignment

Abstract

Metrics

Topics

Related Documents

Emotion-conditional Image Generation Reflecting Semantic Alignment with Text-to-Image Models

High Fidelity Text to Image Generation with Contrastive Alignment and Structural Guidance

Semantic Alignment Through Implicit Reasoning: Revolutionizing Text-to-Image Generation

Semantic Alignment Through Implicit Reasoning: Revolutionizing Text-to-Image Generation

Multi-modal Semantic Alignment Based on Extended Image-Text Contrastive Learning