JOURNAL ARTICLE

VLAD-SSTA: VLAD with Soft Spatio-Temporal Assignment for Action Recognition

Abstract

It is important to simultaneously characterize videos with spatial and temporal information, especially for human action recognition, as spatial cue can model the human appearance while the dynamic motion need to be represented by temporal cue. The vector of locally aggregated descriptor (VLAD) whose assignment with the shortage of temporal information, can be regarded as a suboptimal solution for action recognition. In this paper, VLAD with a soft spatio-temporal assignment, named VLAD-SSTA, is proposed to further boost the performance of action recognition by employing the soft assignment with spatio-temporal characteristic. Specifically, the Spatio- Temporal Aware module is creatively devised with a series of 3D convolutions to capture the spatio-temporal characteristic. Experimental results show that the proposed approach yields state-of-the-art performance on challenging datasets.

Keywords:
Computer science Action recognition Artificial intelligence Action (physics) Economic shortage Pattern recognition (psychology) Motion (physics) Temporal database Dynamics (music) Computer vision Data mining

Metrics

1
Cited By
0.11
FWCI (Field Weighted Citation Impact)
10
Refs
0.49
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Gait Recognition and Analysis
Physical Sciences →  Engineering →  Biomedical Engineering
Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.