In recent years, human action recognition (HAR) has found its ways in a diverse range of applications such as video surveillance, gaming and robotics. Input data for HAR come from different sources such as RGB, depth or skeletal data depending on types of cameras in use. Skeletal data are of great interest recently due to computation and storage efficiency, which attracts a lot of research in computer vision on skeletal-based HAR. In this paper, we propose a new action recognition scheme using Relative Joint Positions with temporal derivatives of joint positions including joint velocity and joint acceleration. Using handcraft feature extraction, our proposed scheme benefits from representation in both spatial domain and temporal domain. Our proposed method outperforms state-of-the-art methods on five benchmark datasets MSR-Action3D, UTKinect-Action, Florence3D-Action, MSRAction Pairs and G3D-Gaming. For MSR-Action3D, our proposed method achieves accuracy of up to 96.88%, averaged over all three action subsets. Largest accuracy improvement is observed on Action Set 1 of MSR-Action3D with 2.98% better than existing methods.
Panagiotis BarmpoutisTania StathakiStephanos Camarinopoulos
Aouaidjia KamelChongsheng ZhangIoannis Pitas
Gerard Marcos FreixasZunlei FengKelvin Ting Zuo HanCheng JinJiacong HuJie LeiXingjiao Wu
Qian HuangMengting XieXing LiShuaichen Wang