Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis

Haozhe Wu; Jia Jia; Haoyu Wang; Yishun Dou; Chao Duan; Qingshan Deng

doi:10.1145/3474085.3475280

ScienceGate Book Chapters

JOURNAL ARTICLE

Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis

Haozhe Wu Jia Jia Haoyu Wang Yishun Dou Chao Duan Qingshan Deng

Year: 2021 Pages: 1478-1486

DOI: 10.1145/3474085.3475280

Get Full-Text PDF Get Analytical Report

Abstract

People talk with diversified styles. For one piece of speech, different\ntalking styles exhibit significant differences in the facial and head pose\nmovements. For example, the "excited" style usually talks with the mouth wide\nopen, while the "solemn" style is more standardized and seldomly exhibits\nexaggerated motions. Due to such huge differences between different styles, it\nis necessary to incorporate the talking style into audio-driven talking face\nsynthesis framework. In this paper, we propose to inject style into the talking\nface synthesis framework through imitating arbitrary talking style of the\nparticular reference video. Specifically, we systematically investigate talking\nstyles with our collected \\textit{Ted-HD} dataset and construct style codes as\nseveral statistics of 3D morphable model~(3DMM) parameters. Afterwards, we\ndevise a latent-style-fusion~(LSF) model to synthesize stylized talking faces\nby imitating talking styles from the style codes. We emphasize the following\nnovel characteristics of our framework: (1) It doesn't require any annotation\nof the style, the talking style is learned in an unsupervised manner from\ntalking videos in the wild. (2) It can imitate arbitrary styles from arbitrary\nvideos, and the style codes can also be interpolated to generate new styles.\nExtensive experiments demonstrate that the proposed framework has the ability\nto synthesize more natural and expressive talking styles compared with baseline\nmethods.\n

Keywords:

Metrics

Cited By

3.78

FWCI (Field Weighted Citation Impact)

Refs

0.94

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Generative Adversarial Networks and Image Synthesis

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Face recognition and analysis

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis

Abstract

Metrics

Citation History

Topics

Related Documents

Audio-Driven Talking Face Generator

Audio-Driven Talking Face Generation with Diverse Yet Realistic Facial Animations

Audio-driven talking face generation with diverse yet realistic facial animations

Audio-Driven 3D Talking Face for Realistic Holographic Mixed-Reality Telepresence

Text-Driven Talking Face Synthesis by Reprogramming Audio-Driven Models