JOURNAL ARTICLE

MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation

Seyeon KimSiyoon JinJihye ParkKiHong KimJi Young KimJisu NamSeungryong Kim

Year: 2025 Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Vol: 39 (4)Pages: 4302-4310   Publisher: Association for the Advancement of Artificial Intelligence

Abstract

Conventional GAN-based models for talking head generation often suffer from limited quality and unstable training. Recent approaches based on diffusion models have attempted to address these limitations and improve fidelity. However, they still face challenges, such as intensive sampling times and difficulties in maintaining temporal consistency due to the high stochasticity of diffusion models. To overcome these challenges, we propose a novel motion-disentangled diffusion model for high-quality talking head generation, called MoDiTalker. We introduce two modules: the Audio-To-Motion (AToM) module, designed to generate synchronized lip movements from audio, and the Motion-To-Video (MToV) module, designed to produce high-quality talking head videos based on the generated motions. AToM excels in capturing subtle lip movements by leveraging an audio attention mechanism. Additionally, MToV enhances temporal consistency by utilizing an efficient tri-plane representation. Our experiments on standard benchmarks demonstrate that our model outperforms existing GAN-based and diffusion-based models. We also provide comprehensive ablation studies and user study results.

Keywords:
Fidelity Diffusion Head (geology) Motion (physics) High fidelity Computer science Physics Artificial intelligence Geology Thermodynamics Acoustics

Metrics

1
Cited By
3.22
FWCI (Field Weighted Citation Impact)
0
Refs
0.82
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and dialogue systems
Physical Sciences →  Computer Science →  Artificial Intelligence
Face recognition and analysis
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.