Action Recognition based on Video Spatio-Temporal Transformer

Mingyang Qiao; Tiantian Yuan

doi:10.1109/icaica54878.2022.9844530

ScienceGate Book Chapters

JOURNAL ARTICLE

Action Recognition based on Video Spatio-Temporal Transformer

Mingyang Qiao Tiantian Yuan

Year: 2022 Journal: 2022 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA) Vol: 1 Pages: 477-481

DOI: 10.1109/icaica54878.2022.9844530

Get Full-Text PDF Get Analytical Report

Abstract

Video action recognition has become a very important research hotspot in the field of computer vision. At present, the methods of action recognition using deep learning, such as C3D networks and 3D ResNet networks, lack attention mechanism, and are not cost-effective due to the high cost when using GPU graphics cards for training. The study proposes a new R-TST network structure, which first uses the LSTM module to correlate the frames of the video to maximize the preservation of the information features of the video action. The TST module structure contains temporal attention and spatial attention to enhance the features' expressive ability of action recognition. The experiment results show that the R-TST network structure can outperform other network structures and improve utilization rate while saving GPU hardware costs, but shows a slight decrease in accuracy on the UCFIOI and HMDB51 datasets.

Keywords:

Computer science Action recognition Graphics Artificial intelligence Transformer Frame rate Deep learning Hotspot (geology) Pattern recognition (psychology) Machine learning Computer graphics (images)

Metrics

Cited By

0.28

FWCI (Field Weighted Citation Impact)

Refs

0.58

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Gait Recognition and Analysis

Physical Sciences → Engineering → Biomedical Engineering

Anomaly Detection Techniques and Applications

Physical Sciences → Computer Science → Artificial Intelligence

Action Recognition based on Video Spatio-Temporal Transformer

Abstract

Metrics

Citation History

Topics

Related Documents

STAR++: Rethinking spatio-temporal cross attention transformer for video action recognition

Decoupled spatio-temporal grouping transformer for skeleton-based action recognition

Video Action Recognition Based on Spatio-Temporal Feature Pyramid Module

Spatio-temporal representation based on autoencoder for video action recognition

Video Action Recognition Based on Spatio-temporal Feature Pyramid Module