Facial expression recognition (FER) plays a crucial role in human-computer interaction and is a challenging task due to the drastic face appearance variations across head poses. In order to classify different expressions under arbitrary poses, in this paper, we utilize an end-to-end encoder-decoder network by leveraging both 2D and 3D modalities for simultaneous facial expression recognition and 3D Morphable Model (3DMM) expression part reconstruction. Specifically, an encoder regresses expression representations from 2D images, and a decoder recovers 3DMM expression parts from corresponding expression representations. These two components are trained jointly with an expression classification loss being explicitly enforced over expression representations. For handling lack of non-frontal views in FER databases, we also generate the profile views of face image with out-of-plane rotation. Finally, the learned expression representations are desirably discriminative, generative and robust to pose variations. Within extended CK+ and Oulu-CASIA database, our proposed method outperforms ExpNet by 34.20% and 30.56% respectively, demonstrating the superiority of the proposed method.
Feifei ZhangYongbin YuQirong MaoJianping GouYongzhao Zhan
Qiong HuXi PengPeng YangFei YangDimitris Metaxas
Can WangShangfei WangGuang Liang