Ruwen BaiMin LiBo MengFengfa LiMiao JiangJunxing RenDegang Sun
Graph convolutional networks (GCNs) have emerged as dom-inant methods for skeleton-based action recognition. How-ever, they still suffer from two problems, namely, neighbor-hood constraints and entangled spatiotemporal feature repre-sentations. Most studies have focused on improving the de-sign of graph topology to solve the first problem but they have yet to fully explore the latter. In this work, we design a dis-entangled spatiotemporal transformer (DSTT) block to over-come the above limitations of GCNs in three steps: (i) feature disentanglement for spatiotemporal decomposition; (ii) global spatiotemporal attention for capturing correlations in the global context; and (iii) local information enhancement for utilizing more local information. Thereon, we propose a novel architecture, named Hierarchical Graph Convolutional skeleton Transformer (HGCT), to employ the complementary advantages of GCN (i.e., local topology, temporal dynamics and hierarchy) and Transformer (i.e., global context and dy-namic attention). HGCT is lightweight and computationally efficient. Quantitative analysis demonstrates the superiority and good interpretability of HGCT.
Shuo ChenKe XuBo ZhuXinghao JiangTanfeng Sun
Linjiang HuangYan HuangWanli OuyangLiang Wang
Changxiang HeShuting LiuYing ZhaoXiaofei QinJiayuan ZengXuedian Zhang
Yujian JiangZhaoneng SunSaisai YuShuang WangYang Song