JOURNAL ARTICLE

Audio-driven Talking Head Generation with Transformer and 3D Morphable Model

Ricong HuangWeizhi ZhongGuanbin Li

Year: 2022 Journal:   Proceedings of the 30th ACM International Conference on Multimedia Pages: 7035-7039

Abstract

In the task of talking head generation, it is hard to learn the mapping relationship between generated head image and input audio signal. To tackle this challenge, we propose to learn the mapping relationship between input audio signal and the parameters of three-dimensional morphable face model (3DMM) first, which is easier to learn. Then the parameters of 3DMM are used to guide the generation of high-quality talking head images. Prior works mostly encode audio features from short audio windows, which may influence the accuracy of lip movements sometimes because of the limited context. In this paper, we propose a transformer-based audio encoder to take full use of the long-term context from audio and then predict a sequence of 3DMM parameters accurately. Unlike prior works that only use the 3DMM parameters of expression, rotation and translation, we propose to include the parameters of identity. Since the location of 3D facial mesh point is decided by the expression and identity parameters, it is helpful to supply more subtle control of lip movement by considering the identity parameters. The experimental results reveal that our method ranks first in 4 of the total 11 evaluation metrics, which ranks first in the talking head generation track.

Keywords:
Computer science Transformer Speech recognition Encoder Artificial intelligence Face (sociological concept) Computer vision Voltage

Metrics

13
Cited By
0.90
FWCI (Field Weighted Citation Impact)
28
Refs
0.81
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Face recognition and analysis
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Generative Adversarial Networks and Image Synthesis
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.