In this paper, an approach is presented to estimate the 3D position and orientation of head from RGB and depth images captured by a commercial sensor Kinect. We use 2D Scale-invariant feature transform (SIFT) features together with 3D histogram of oriented gradients (HOG) features which are extracted in a pair of RGB and depth images captured synchronously, named SIFT-HOG features, to improve the robustness and accuracy of head pose estimation. We apply random forests to formulate pose estimation as a regression problem, due to their power for handling large training data and the high mapping speed. And then the mean-shift method is employed to refine the result obtained by the random forests. The experiment results demonstrate that our approach of head pose estimation is efficient.
Nastaran GhadarghadarEsra Ataer-CansızoğluPeng ZhangDeniz Erdoğmuş