Abstract: We propose a more effective Deep Fusion Generative Adversarial Networks (DF- GAN) for synthesizing high-quality realistic images from text descriptions. The main challenges in this task are the entanglements between generators of different image scales, the reliance on extra networks for text-image semantic consistency, and the computational cost of cross-modal attention-based fusion. Ourproposed approach addresses these challenges as follow
Fanrong MengDezhi HanXiang ShenChongqing ChenQun Wang
Xu OuyangYing ChenKaiyue ZhuGady Agam
Duy LeBao Q. BuiAnh TranCong TranCuong Hung Pham
Wenting ChenPengyu WangHui RenLichao SunQuanzheng LiYixuan YuanXiang Li