CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation

Aditya Sanghi; Hang Chu; Joseph G. Lambourne; Ye Wang; Chin-Yi Cheng; Marco Fumero; Kamal Rahimi Malekshan

doi:10.1109/cvpr52688.2022.01805

ScienceGate Book Chapters

JOURNAL ARTICLE

CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation

Aditya Sanghi Hang Chu Joseph G. Lambourne Ye Wang Chin-Yi Cheng Marco Fumero Kamal Rahimi Malekshan

Year: 2022 Journal: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pages: 18582-18592

DOI: 10.1109/cvpr52688.2022.01805

Get Full-Text PDF Get Analytical Report

Abstract

Generating shapes using natural language can enable new ways of imagining and creating the things around us. While significant recent progress has been made in text-to-image generation, text-to-shape generation remains a challenging problem due to the unavailability of paired text and shape data at a large scale. We present a simple yet effective method for zero-shot text-to-shape gener-ation that circumvents such data scarcity. Our proposed method, named CLIP-Forge, is based on a two-stage training process, which only depends on an unlabelled shape dataset and a pre-trained image-text network such as CLIP. Our method has the benefits of avoiding expensive inference time optimization, as well as the ability to generate multiple shapes for a given text. We not only demonstrate promising zero-shot generalization of the CLIP-Forge model qualitatively and quantitatively, but also provide extensive compar-ative evaluations to better understand its behavior.

Keywords:

Unavailability Computer science Shot (pellet) Forge Zero (linguistics) Inference Generalization Artificial intelligence Image (mathematics) Process (computing) Algorithm Pattern recognition (psychology) Mathematics Engineering

Metrics

193

Cited By

13.33

FWCI (Field Weighted Citation Impact)

Refs

0.99

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

3D Shape Modeling and Analysis

Physical Sciences → Engineering → Computational Mechanics

Handwritten Text Recognition Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation

Abstract

Metrics

Citation History

Topics

Related Documents

Towards alleviating hallucination in text-to-image retrieval for CLIP in zero-shot learning

VoxelCode: Text-to-3D Shape Zero-Shot Generation via Discrete Latent Transformers

MV-CLIP: Multi-View CLIP for Zero-Shot 3D Shape Recognition

Towards Zero-Shot Personalized Table-to-Text Generation with Contrastive Persona Distillation

ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation