Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis

Duomin Wang; Yu Deng; Zixin Yin; Heung‐Yeung Shum; Baoyuan Wang

doi:10.1109/cvpr52729.2023.01724

ScienceGate Book Chapters

JOURNAL ARTICLE

Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis

Duomin Wang Yu Deng Zixin Yin Heung‐Yeung Shum Baoyuan Wang

Year: 2023 Pages: 17979-17989

DOI: 10.1109/cvpr52729.2023.01724

Get Full-Text PDF Get Analytical Report

Abstract

We present a novel one-shot talking head synthesis method that achieves disentangled and fine-grained control over lip motion, eye gaze&blink, head pose, and emotional expression. We represent different motions via disentangled latent representations and leverage an image generator to synthesize talking heads from them. To effectively disentangle each motion factor, we propose a progressive disentangled representation learning strategy by separating the factors in a coarse-to-fine manner, where we first extract unified motion feature from the driving signal, and then isolate each fine-grained motion from the unified feature. We leverage motion-specific contrastive learning and regressing for non-emotional motions, and introduce feature-level decorrelation and self-reconstruction for emotional expression, to fully utilize the inherent properties of each motion factor in unstructured video data to achieve disentanglement. Experiments show that our method provides high quality speech&lip-motion synchronization along with precise and disentangled control over multiple extra facial motions, which can hardly be achieved by previous methods.

Keywords:

Computer science Artificial intelligence Leverage (statistics) Motion (physics) Decorrelation Feature learning Feature (linguistics) Computer vision Representation (politics) Generative model Gaze Speech recognition Generative grammar

Metrics

Cited By

7.64

FWCI (Field Weighted Citation Impact)

112

Refs

0.97

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Generative Adversarial Networks and Image Synthesis

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Face recognition and analysis

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis

Abstract

Metrics

Citation History

Topics

Related Documents

Fine-Grained and Controllable Speech Synthesis Based on Disentangled Latent Representation in StyleGAN

Learning Disentangled Representation for Fine-Grained Visual Categorization

Fine-Grained Disentangled Representation Learning For Multimodal Emotion Recognition

FG-EmoTalk: Talking Head Video Generation with Fine-Grained Controllable Facial Expressions

Fine-grained Multi-lingual Disentangled Autoencoder for Language-agnostic Representation Learning