JOURNAL ARTICLE

Disentangled Latent Diffusion: Unlocking Compositional Semantic Control for High-Fidelity Image Synthesis

Revista, ZenIA, 10

Year: 2025 Journal:   Zenodo (CERN European Organization for Nuclear Research)   Publisher: European Organization for Nuclear Research

Abstract

The unprecedented advancements in text-to-image diffusion models have revolutionized digital content creation, yet a critical challenge persists: achieving precise, disentangled compositional semantic control. While these models excel at generating aesthetically pleasing images from broad textual prompts, they often struggle with fine-grained control over individual object attributes, their spatial relationships, and the coherent integration of multiple semantic elements within a single scene. This paper introduces Disentangled Latent Diffusion (DLD), a novel framework designed to address these limitations by explicitly separating distinct semantic factors within the latent space of a diffusion model. Our approach integrates a specialized disentanglement module that encourages the formation of independent latent dimensions corresponding to object identity, attributes, pose, and spatial location. This disentangled latent representation is then harnessed by a hierarchical compositional control mechanism, which allows users to specify prompts at varying granularities, from global scene descriptions to precise manipulation of individual components. Through a multi-stage training strategy incorporating self-supervised disentanglement objectives and a novel compositional consistency loss, DLD significantly enhances the model's ability to interpret and execute complex compositional instructions. Extensive quantitative and qualitative evaluations demonstrate that DLD achieves superior fidelity, semantic alignment, and, crucially, unprecedented levels of fine-grained compositional control compared to state-of-the-art baselines. This work represents a significant step towards more intuitive and controllable high-fidelity image synthesis, paving the way for advanced creative and professional applications.

Keywords:
Representation (politics) Consistency (knowledge bases) Object (grammar) Latent semantic analysis Control (management) Space (punctuation) Semantics (computer science)

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.69
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Generative Adversarial Networks and Image Synthesis
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Aesthetic Perception and Analysis
Life Sciences →  Neuroscience →  Cognitive Neuroscience

Related Documents

JOURNAL ARTICLE

Disentangled Latent Diffusion: Unlocking Compositional Semantic Control for High-Fidelity Image Synthesis

Revista, ZenIA, 10

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2025
JOURNAL ARTICLE

Semantic Latent Diffusion: Unlocking Fine-Grained Control in High-Fidelity Image Generation

Revista, ZenIA, 10

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2025
JOURNAL ARTICLE

Semantic Latent Diffusion: Unlocking Fine-Grained Control in High-Fidelity Image Generation

Revista, ZenIA, 10

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2025
© 2026 ScienceGate Book Chapters — All rights reserved.