ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-wise Semantic Alignment and Generation

Jianan Wang; Guansong Lu; Hang Xu; Zhenguo Li; Chunjing Xu; Yanwei Fu

doi:10.1109/cvpr52688.2022.01044

ScienceGate Book Chapters

JOURNAL ARTICLE

ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-wise Semantic Alignment and Generation

Jianan Wang Guansong Lu Hang Xu Zhenguo Li Chunjing Xu Yanwei Fu

Year: 2022 Journal: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pages: 10697-10707

DOI: 10.1109/cvpr52688.2022.01044

Get Full-Text PDF Get Analytical Report

Abstract

Existing text-guided image manipulation methods aim to modify the appearance of the image or to edit a few objects in a virtual or simple scenario, which is far from practical application. In this work, we study a novel task on text-guided image manipulation on the entity level in the real world. The task imposes three basic requirements, (1) to edit the entity consistent with the text descriptions, (2) to preserve the text-irrelevant regions, and (3) to merge the manipulated entity into the image naturally. To this end, we propose a new transformer-based framework based on the two-stage image synthesis method, namely ManiTrans, which can not only edit the appearance of entities but also generate new entities corresponding to the text guidance. Our framework incorporates a semantic alignment module to locate the image regions to be manipulated, and a semantic loss to help align the relationship between the vision and language. We conduct extensive experiments on the real datasets, CUB, Oxford, and COCO datasets to verify that our method can distinguish the relevant and irrelevant regions and achieve more precise and flexible manipulation compared with baseline methods.

Keywords:

Computer science Security token Merge (version control) Artificial intelligence Image (mathematics) Natural language processing Resolver Information retrieval

Metrics

Cited By

0.97

FWCI (Field Weighted Citation Impact)

Refs

0.82

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Generative Adversarial Networks and Image Synthesis

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Digital Media Forensic Detection

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-wise Semantic Alignment and Generation

Abstract

Metrics

Citation History

Topics

Related Documents

Entity-Level Alignment with Prompt-Guided Adapter for Remote Sensing Image-Text Retrieval

SeTGAN: Semantic‐text guided face image generation

Learning semantic alignment from image for text-guided image inpainting

Sound-Guided Semantic Image Manipulation

TediGAN: Text-Guided Diverse Face Image Generation and Manipulation