Since its release in late 2010 the Microsoft Kinect depth sensor has boosted real time gesture recognition and new man-machine interaction endeavors in the computer vision community. Based on depth image data, in this paper we propose an accurate, fast and robust face pose estimation approach, which for example can be of interest for user behavior analysis, or be of use as a means of man machine interaction modality. In our method we apply the depth sensor to create a user specific model which is fitted with an Iterative Closest Point algorithm. This model consists of point vertices and surface normals. In the fitting procedure we employ the normal vectors for the minimization of distances between the model and the measured point cloud. As the experimental results show, our method is precise, fast and robust in case of strong head rotation, even during facial expression and partial face occlusion.
Christiano GavaBernd KrollaDidier Stricker
Chris RockwellNilesh KulkarniLinyi JinJeong Joon ParkJustin C. JohnsonDavid F. Fouhey