Video interest points, in combination with local appearance descriptors, are used for human action recognition. Most of the previously proposed video interest point detectors are straightforward extensions of some image interest point detector or the other. These methods treat the temporal dimension (inter-frame) similar to the spatial dimensions (intra-frame). We argue that certain unique properties of the temporal dimension beg a different treatment. We propose an interest point detector based on vector calculus of optical flow to take advantage of the unique properties of the temporal dimension. Compared to previously proposed methods, the proposed method exhibits higher repeatability (robustness) and lower displacement (stability) of interest points under two common video transformations tested — video compression and spatial scaling. It also shows competitive action recognition performance when paired with appropriate feature descriptors in a bag of features model.
Kevis ManinisPetros KoutrasPetros Maragos
Yuanbo ChenZhixuan LiXin GuoYanyun ZhaoAnni Cai