JOURNAL ARTICLE

Improving Speech Related Facial Action Unit Recognition by Audiovisual Information Fusion

Zibo MengShizhong HanPing LiuYan Tong

Year: 2018 Journal:   IEEE Transactions on Cybernetics Vol: 49 (9)Pages: 3293-3306   Publisher: Institute of Electrical and Electronics Engineers

Abstract

It is challenging to recognize facial action unit (AU) from spontaneous facial displays, especially when they are accompanied by speech. The major reason is that the information is extracted from a single source, i.e., the visual channel, in the current practice. However, facial activity is highly correlated with voice in natural human communications. Instead of solely improving visual observations, this paper presents a novel audiovisual fusion framework, which makes the best use of visual and acoustic cues in recognizing speech-related facial AUs. In particular, a dynamic Bayesian network is employed to explicitly model the semantic and dynamic physiological relationships between AUs and phonemes as well as measurement uncertainty. Experiments on a pilot audiovisual AU-coded database have demonstrated that the proposed framework significantly outperforms the state-of-the-art visual-based methods in terms of recognizing speech-related AUs, especially for those AUs whose visual observations are impaired during speech, and more importantly is also superior to audio-based methods and feature-level fusion methods, which employ low-level audio features, by explicitly modeling and exploiting physiological relationships between AUs and phonemes.

Keywords:
Computer science Speech recognition Feature (linguistics) Dynamic Bayesian network Action (physics) Facial expression Artificial intelligence Bayesian probability Natural language processing Pattern recognition (psychology)

Metrics

23
Cited By
2.35
FWCI (Field Weighted Citation Impact)
111
Refs
0.89
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Multisensory perception and integration
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Blind Source Separation Techniques
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.