Shilei ChengGuoyi QinSiqi LiMei XieZheng Ma
It is important to simultaneously characterize videos with spatial and temporal information, especially for human action recognition, as spatial cue can model the human appearance while the dynamic motion need to be represented by temporal cue. The vector of locally aggregated descriptor (VLAD) whose assignment with the shortage of temporal information, can be regarded as a suboptimal solution for action recognition. In this paper, VLAD with a soft spatio-temporal assignment, named VLAD-SSTA, is proposed to further boost the performance of action recognition by employing the soft assignment with spatio-temporal characteristic. Specifically, the Spatio- Temporal Aware module is creatively devised with a series of 3D convolutions to capture the spatio-temporal characteristic. Experimental results show that the proposed approach yields state-of-the-art performance on challenging datasets.
Ionuţ Cosmin DuţăBogdan IonescuKiyoharu AizawaNicu Sebe
Shilei ChengMei XieZheng MaSiqi LiSong GuYang Feng
Qinghui LiAihua LiZhigao CuiYanzhao Su