TextToucher: Fine-Grained Text-to-Touch Generation

Jiangping Tu; Hao Fu; Fengyu Yang; Hanbin Zhao; Chao Zhang; Hui Qian

doi:10.1609/aaai.v39i7.32802

ScienceGate Book Chapters

JOURNAL ARTICLE

TextToucher: Fine-Grained Text-to-Touch Generation

Jiangping Tu Hao Fu Fengyu Yang Hanbin Zhao Chao Zhang Hui Qian

Year: 2025 Journal: Proceedings of the AAAI Conference on Artificial Intelligence Vol: 39 (7)Pages: 7455-7463 Publisher: Association for the Advancement of Artificial Intelligence

DOI: 10.1609/aaai.v39i7.32802

Get Full-Text PDF Get Analytical Report

Abstract

Tactile sensation plays a crucial role in the development of multi-modal large models and embodied intelligence. To collect tactile data with minimal cost as possible, a series of studies have attempted to generate tactile images by vision-to-touch image translation. However, compared to text modality, visual modality-driven tactile generation cannot accurately depict human tactile sensation. In this work, we analyze the characteristics of tactile images in detail from two granularities: object-level (tactile texture, tactile shape), and sensor-level (gel status). We model these granularities of information through text descriptions and propose a fine-grained Text-to-Touch generation method (TextToucher) to generate high-quality tactile samples. Specifically, we introduce a multimodal large language model to build the text sentences about object-level tactile information and employ a set of learnable text prompts to represent the sensor-level tactile information. To better guide the tactile generation process with the built text information, we fuse the dual grains of text information and explore various dual-grain text conditioning methods within the diffusion transformer architecture. Furthermore, we propose a Contrastive Text-Touch Pre-training (CTTP) metric to precisely evaluate the quality of text-driven generated tactile data. Extensive experiments demonstrate the superiority of our TextToucher method.

Keywords:

Computer science

Metrics

Cited By

3.22

FWCI (Field Weighted Citation Impact)

Refs

0.81

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and dialogue systems

Physical Sciences → Computer Science → Artificial Intelligence

Human Motion and Animation

Physical Sciences → Engineering → Control and Systems Engineering

Interactive and Immersive Displays

Physical Sciences → Computer Science → Human-Computer Interaction

TextToucher: Fine-Grained Text-to-Touch Generation

Abstract

Metrics

Citation History

Topics

Related Documents

Text-to-Image Generation Grounded by Fine-Grained User Attention

Fine-Grained Text-to-Shape Generation via CLIP Latent Space Adaptation

Fine-grained Text to Image Synthesis

Fine-Grained Controllable Text Generation Using Non-Residual Prompting

ULTRABENCH: Benchmarking LLMs under Extreme Fine-grained Text Generation