Tuo ZangJian‐Feng TuMengran DuanLingfeng Liu
Graph Convolution Network (GCN) is a commonly used method in skeleton action recognition, but how to extract complex spatio-temporal features more effectively to improve the recognition accuracy is still a problem to be solved. In this paper, we propose (1) Difference temporal graph convolution (DT-GCN) and (2) Multi-scale interactional channel excitation (MS-ICE) attention layer. The DT-GCN utilizes temporal difference compute the adjacency matrix for each frame increasing the flexibility of spatial modeling. The MS-ICE module use multi-scale convolution kernels to capture sensitive relationships between spatio-temporal and channel dimensions. In addition, since angle information can be a great response to motion variations between joints, we use angles together with conventional articulated skeletons to form a multi-stream fusion framework. By combining the above, we designed a multi-stream, temporal-channel enhanced graph convolution network (MS-TCEGCN), and extensive experiments on the NTU-RGB+D dataset show that our model performs at state-of-the-art level.
Fanjia LiAichun ZhuYonggang XuRan CuiGang Hua
Siyue LeiBin TangYanhua ChenMingfu ZhaoYifei XuZourong Long
Haiping ZhangXu LiuDongjin YuLiming GuanDongjing WangConghao MaZepeng Hu