Yuran XiangHaiteng ZhaoChang MaZhi‐Hong Deng
Recent advancements in computational chemistry have increasingly emphasized generating and editing molecules from textual instructions. However, integrating graph generation with instruction understanding remains challenging, as most existing approaches either rely on molecular sequences in text modality with limited structural information, or struggle with multimodal alignment in graph diffusion methods. To address these limitations, we propose UTGDiff (Unified Text-Graph Diffusion Model), a novel framework that utilizes pre-trained language models for discrete graph diffusion, enabling the generation of molecular graphs from instructions. UTGDiff introduces a unified text-graph transformer as a denoising network, adapted with minimal modifications from language models to process graph data via attention bias. Experimental results show that UTGDiff consistently outperforms both sequence-based and conditional graph-diffusion baselines on instruction-based molecule generation and editing tasks with fewer parameters, covering instructions specifying molecular structures or properties.
Rui HeA. H. ZakariQian YangJiaqi LuoChangsheng Ma
Han HuangLeilei SunBowen DuWeifeng Lv
Fan ZhangKebing JinHankz Hankui Zhuo
Jianyang WangBin JuXiaoliang QianMinchao YeWanli Huo