Recent years, human action recognition based on skeleton data from RGB-D sensors has achieved remarkable performance. However, the accuracy of action recognition is primarily affected by a large amount of noise introduced during the acquisition of human skeleton data and the subsequent feature extraction operation. In this work, we present a new type of Spatio-temporal channel feature shrinkage network (STCSN), which dynamically learns thresholds for practical feature shrinkage across channels and spatiotemporally for eliminating noise in features or irrelevant features. The proposed Spatio-temporal channel feature shrinkage network introduces only a small number of parameters, while delivering significant performance gains. By adding STCSN to a basic graph convolution network, we develop a robust STCSN-GCN graph convolution network that achieves substantial performance gains on the NW-UCLA and NTU RGB+D datasets.
Wenwen DingKai LiuFei ChengJin Zhang
Runjie LiNing HeJinhua WangFengxi SunHongfei Liu
Runjie LiNing HeChaoqun WangRuicheng WangWenhua Wang
Haiming SunSong WangZhenming ZhangJiye YanLin MaWeiyao Xu