The estimation of human poses has received a great deal of attention in the study of human activity recognition and for many applications. However, it is a challenging problem due to complex background, appearance variations of the poses, self occlusion, object occlusions, low resolutions and illuminations of the images. To overcome the limitations and enhance the performance, a model is introduced for estimating the pose of human by taking advantage from the deep convolutional neural networks. This model is built on a parsing strategy called bottom-up that provides features for extracting skeletal keypoints and the keypoints association vector field using non-parametric description. The refined prediction from multiple stages causes further improvement of the localized keypoints accuracy. This proposed model is trained and tested on the benchmark MPII Human Pose dataset. The proposed model exceeds the current state-of-the-art algorithm in terms of accuracy. Moreover, it detects some occluded keypoints by incorporating occlusion network in feature representation process of the keyjoints resulting in 89.0% Mean Average Precision.
Sungheon ParkJihye HwangNojun Kwak
Renwen ChenTonghe YuanWen-bin HUANGYuxiang Zhang
Dileep KottililKwarley Quartey AnnaMaya L. Pai
Anthony TannouryE M ChoueiriRony Darazi