JOURNAL ARTICLE

Action Recognition based on Video Spatio-Temporal Transformer

Mingyang QiaoTiantian Yuan

Year: 2022 Journal:   2022 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA) Vol: 1 Pages: 477-481

Abstract

Video action recognition has become a very important research hotspot in the field of computer vision. At present, the methods of action recognition using deep learning, such as C3D networks and 3D ResNet networks, lack attention mechanism, and are not cost-effective due to the high cost when using GPU graphics cards for training. The study proposes a new R-TST network structure, which first uses the LSTM module to correlate the frames of the video to maximize the preservation of the information features of the video action. The TST module structure contains temporal attention and spatial attention to enhance the features' expressive ability of action recognition. The experiment results show that the R-TST network structure can outperform other network structures and improve utilization rate while saving GPU hardware costs, but shows a slight decrease in accuracy on the UCFIOI and HMDB51 datasets.

Keywords:
Computer science Action recognition Graphics Artificial intelligence Transformer Frame rate Deep learning Hotspot (geology) Pattern recognition (psychology) Machine learning Computer graphics (images)

Metrics

4
Cited By
0.28
FWCI (Field Weighted Citation Impact)
34
Refs
0.58
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Gait Recognition and Analysis
Physical Sciences →  Engineering →  Biomedical Engineering
Anomaly Detection Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.