The location information of interest points is an important cue for action recognition. In order to model the spatio-temporal distribution, we propose a novel position feature which is constructed by normalized pairwise relative positions of points. Promising performance has been achieved by Vector of Locally Aggregated Descriptors (VLAD) which gather the differences between descriptors and visual words. However, original VLAD imposes equal weights for difference vectors and ignores zero-order statistics of local descriptors. In this paper, we present Generalized VLAD (GVLAD), an extension of VLAD to encode the position features as well as local appearance descriptors, by which different weights and zero-order information are simultaneously taken into consideration. The state-of-the-art performance on two benchmark datasets validates the effectiveness of our proposed method.
Ionuţ Cosmin DuţăBogdan IonescuKiyoharu AizawaNicu Sebe
Shilei ChengGuoyi QinSiqi LiMei XieZheng Ma
Shilei ChengMei XieZheng MaSiqi LiSong GuYang Feng
Qinghui LiAihua LiZhigao CuiYanzhao Su
Lin BoBin FangWeibin YangJiye Qian