Actions by humans in real-world settings involve large changes in the person's pose and the relative orientation with respect to the camera. Person tracking algorithms often fail under such conditions, since they work by detecting and tracking people in a few known poses (typically standing). Further, due to occlusions and similarity of clothing with background, foreground silhouettes are typically very noisy. We present an approach which address these problems by first accurately tracking a person through changing pose and broken foreground blobs. During the tracking process we also estimate the relative orientation and scale of the person. We represent the pose of the person in each track window using a grid-of-centroids model, and recognize the action by matching with a set of keyposes, in each frame. We tested our approach in a dataset collected in a real grocery store, and report better than ≈82.5% accuracy for frame-by-frame recognition of actions.
Qingxiang WangHiroshi Hanaizumi
Fan YangShigeyuki OdashimaSosuke YamaoHiroaki FujimotoShoichi MasuiShan Jiang
Alexandros André ChaaraouiPau Climent-PérezFrancisco Flórez‐Revuelta