In this paper, we present a system for estimating human head pose with the use of multiple camera views. We apply a neural network to each of the views, and fuse the output using a Bayesian filter framework. Thus, we achieve a more robust estimation compared to pure monocular approaches. The system is evaluated on low resolution seminar video recordings with rather bad lighting, on which the captured head size varies around 20 times 25 pixels. In total we achieved a correct classification in 39.4% of all frames (one of eight classes). If neighbouring classes were allowed, even 73.4% of the frames were correctly classified
Michael VoitKai NickelR. Stiefelhagen
Michael VoitKai NickelRainer Stiefelhagen
Voit, M.Nickel, K.Stiefelhagen, R.