JOURNAL ARTICLE

Encoding spatio-temporal distribution by generalized VLAD for action recognition

Abstract

The location information of interest points is an important cue for action recognition. In order to model the spatio-temporal distribution, we propose a novel position feature which is constructed by normalized pairwise relative positions of points. Promising performance has been achieved by Vector of Locally Aggregated Descriptors (VLAD) which gather the differences between descriptors and visual words. However, original VLAD imposes equal weights for difference vectors and ignores zero-order statistics of local descriptors. In this paper, we present Generalized VLAD (GVLAD), an extension of VLAD to encode the position features as well as local appearance descriptors, by which different weights and zero-order information are simultaneously taken into consideration. The state-of-the-art performance on two benchmark datasets validates the effectiveness of our proposed method.

Keywords:
ENCODE Pattern recognition (psychology) Pairwise comparison Benchmark (surveying) Computer science Encoding (memory) Position (finance) Artificial intelligence Feature (linguistics) Distribution (mathematics) Action recognition Mathematics Geography Cartography Mathematical analysis

Metrics

2
Cited By
0.00
FWCI (Field Weighted Citation Impact)
38
Refs
0.03
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.