Audio-driven Talking Head Generation with Transformer and 3D Morphable Model

Ricong Huang; Weizhi Zhong; Guanbin Li

doi:10.1145/3503161.3551574

ScienceGate Book Chapters

JOURNAL ARTICLE

Audio-driven Talking Head Generation with Transformer and 3D Morphable Model

Ricong Huang Weizhi Zhong Guanbin Li

Year: 2022 Journal: Proceedings of the 30th ACM International Conference on Multimedia Pages: 7035-7039

DOI: 10.1145/3503161.3551574

Get Full-Text PDF Get Analytical Report

Abstract

In the task of talking head generation, it is hard to learn the mapping relationship between generated head image and input audio signal. To tackle this challenge, we propose to learn the mapping relationship between input audio signal and the parameters of three-dimensional morphable face model (3DMM) first, which is easier to learn. Then the parameters of 3DMM are used to guide the generation of high-quality talking head images. Prior works mostly encode audio features from short audio windows, which may influence the accuracy of lip movements sometimes because of the limited context. In this paper, we propose a transformer-based audio encoder to take full use of the long-term context from audio and then predict a sequence of 3DMM parameters accurately. Unlike prior works that only use the 3DMM parameters of expression, rotation and translation, we propose to include the parameters of identity. Since the location of 3D facial mesh point is decided by the expression and identity parameters, it is helpful to supply more subtle control of lip movement by considering the identity parameters. The experimental results reveal that our method ranks first in 4 of the total 11 evaluation metrics, which ranks first in the talking head generation track.

Keywords:

Computer science Transformer Speech recognition Encoder Artificial intelligence Face (sociological concept) Computer vision Voltage

Metrics

Cited By

0.90

FWCI (Field Weighted Citation Impact)

Refs

0.81

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Face recognition and analysis

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Generative Adversarial Networks and Image Synthesis

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Audio-driven Talking Head Generation with Transformer and 3D Morphable Model

Abstract

Metrics

Citation History

Topics

Related Documents

Audio-Driven Talking Head Video Generation with Diffusion Model

One-shot motion talking head generation with audio-driven model

Facial Expression-Aware Talking Head Generation with 3D Morphable Model

Talking Head Generation Based on 3D Morphable Facial Model

Audio-Driven Talking Head Generation with Emotion Based on FLAME Geometry Model