Zero-Shot Action Recognition with Transformer-based Video Semantic Embedding

Keval Doshi; Yasin Yılmaz

doi:10.1109/cvprw59228.2023.00514

ScienceGate Book Chapters

JOURNAL ARTICLE

Zero-Shot Action Recognition with Transformer-based Video Semantic Embedding

Keval Doshi Yasin Yılmaz

Year: 2023 Pages: 4859-4868

DOI: 10.1109/cvprw59228.2023.00514

Get Full-Text PDF Get Analytical Report

Abstract

While video action recognition has been an active area of research for several years, zero-shot action recognition has only recently started gaining traction. In this work, we propose a novel end-to-end trained transformer model which is capable of capturing long range spatiotemporal dependencies efficiently, contrary to existing approaches which use 3D-CNNs. Moreover, to address a common ambiguity in the existing works about classes that can be considered as previously unseen, we propose a new experimentation setup that satisfies the zero-shot learning premise for action recognition by avoiding overlap between the training and testing classes. The proposed approach significantly outperforms the state of the arts in zero-shot action recognition in terms of the the top-1 accuracy on UCF-101, HMDB-51 and ActivityNet datasets.

Keywords:

Computer science Transformer Embedding Ambiguity Artificial intelligence Action recognition Shot (pellet) Zero (linguistics) Pattern recognition (psychology) Speech recognition Engineering Voltage

Metrics

Cited By

1.82

FWCI (Field Weighted Citation Impact)

Refs

0.83

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Anomaly Detection Techniques and Applications

Physical Sciences → Computer Science → Artificial Intelligence

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Zero-Shot Action Recognition with Transformer-based Video Semantic Embedding

Abstract

Metrics

Citation History

Topics

Related Documents

Spatiotemporal visual-semantic embedding network for zero-shot action recognition

Generalized Zero-Shot Recognition Based on Visually Semantic Embedding

Few-shot Action Recognition with Video Transformer

Enhancing Zero-Shot Skeleton-Based Action Recognition with Multi-semantic Action Descriptions

Semantic matters: A constrained approach for zero-shot video action recognition