Some models based on Latent Diffusion Models (LDMs), like Stable Diffusion, have revolutionized the image generation field in recent years. But LDMs’ inherent precision control is often not effective enough to solve practical application problems. This paper reviews and compares five classic or state-of-the-art conditional control mechanisms—ControlNet, T2I-Adapter, Composer, UniControl, and FreeControl—designed to address this limitation. This paper analyze their architectural principles, performance trade-offs (e.g., in average FID score, computational cost, and inference speed), and applicability across different domains. Our comparative analysis demonstrates that while UniControl and Composer excel in dealing with tasks with high-quality requirement for their good performance in fine- grained control, methods like T2I-Adapter and FreeControl offer superior efficiency for mobile deployment due to their low computational demands. As the earliest control mechanism, ControlNet is still an effective mechanism and has certain application value. This overview provides a foundation for selecting appropriate control mechanisms for specific image generation tasks.
Matteo PettenòAlessandro MezzaAlberto Bernardini
Matteo PettenòAlessandro MezzaAlberto Bernardini
Yaosi HuZhenzhong ChenChong Luo
Haomiao NiChanghao ShiKai LiXiaolei HuangMartin Renqiang Min