Jianyang XieYanda MengYitian ZhaoAnh NguyenXiaoyun YangYalin Zheng
Human action recognition is an essential topic in computer vision and image processing. Graph convolutional networks (GCNs) have attracted significant attention and achieved noteworthy performance in skeleton-based human action recognition tasks. However, most of the previous graph-based works are designed to refine skeleton topology without considering the types of different joints and edges and the occurrence order of the frames. Such a limitation makes them insufficient to represent intrinsic semantic information. Differently, we proposed a dynamic semantic-based spatial-temporal graph convolution network (DS-STGCN) to address the challenge. DS-STGCN has two dynamic semantic modules for spatial and temporal contexts respectively. Specifically, the joints and edge types were encoded in the spatial module implicitly, and the occurrence order of frames was encoded in the temporal module implicitly. Extensive experiments on four datasets including NTU-RGB+D 60(120), Kinetics-400, and FineGYM show that our proposed two semantic modules can bring consistent recognition performance improvement with various backbones. Meanwhile, the proposed DS-STGCN notably surpassed state-of-the-art methods on these datasets. Notably, in the more challenging dataset, such as Kinetics-400, our model significantly outperformed other state-of-the-art GCN-based methods by a large margin. The code has been released at https://github.com/davelailai/DS-STGCN.
Jianyang XieYanda MengYitian ZhaoAnh NguyenXiaoyun YangYalin Zheng
Jian LüTingting HuangBo ZhaoXiaogai ChenJian ZhouKaibing Zhang
Mrugendrasinh RahevarAmit GanatraTanzila SabaAmjad RehmanSaeed Ali Bahaj
Chen LianKe LüZehai NiuRunchen WeiJian Xue