Although action recognition is an important research area in computer vision, current mainstream methods lack a sufficient emphasis on local features. Some action recognition approaches focus on local action features by dividing the predefined human skeleton into various parts, such as the left and right hands, left and right legs. However, these parts contain fewer skeleton keypoints, resulting in similar action features and a lower recognition efficiency. Moreover, existing methods based on local action features often neglect global posture characteristics, leading to unstable model recognition accuracy. To address these issues, this study proposes a method for refining local features in action recognition based on graph convolution. The proposed method divides the predefined human skeleton topology into body and upper/lower limbs, enhancing the model's capability to focus on local action features. Simultaneously, a local feature refiner uses contrastive learning strategies to expand the differences in the local action features of different types of actions, reduce the differences between similar actions, and solve the problem of similar action features caused by partitioning strategies. Accordingly, the classification results of the upper and lower limbs are combined with those of the body, fully utilizing the global pose features to improve model stability. Experimental results show that the recognition accuracies achieved by this method on two NTU RGB+D 60 benchmark datasets X-Sub and X-View are 93.0% and 98.8%, respectively. Furthermore, the recognition accuracies of X-Sub and X-Set on the NTU RGB+D 120 benchmark datasets are 88.8% and 90.1%, respectively, representing effective improvements in the accuracy of action recognition.
Yuxin ChenZiqi ZhangChunfeng YuanBing LiYing DengWeiming Hu
SUN Qixiang, HE Ning, ZHANG Congcong, LIU Shengjie
Shucheng XieShengze LiPeng ChenBing WangJun Zhang
Yuchen LiuJianshe DongJia MengChengxu Liang
Huaijun WangBingqian BaiJunhuai LiKe HuiXiang Wei