STIT: Spatio-Temporal Interaction Transformers for Human-Object Interaction Recognition in Videos

Muna Almushyti; Frederick W. B. Li

doi:10.1109/icpr56361.2022.9956030

ScienceGate Book Chapters

JOURNAL ARTICLE

STIT: Spatio-Temporal Interaction Transformers for Human-Object Interaction Recognition in Videos

Muna Almushyti Frederick W. B. Li

Year: 2022 Journal: 2022 26th International Conference on Pattern Recognition (ICPR) Pages: 3287-3294

DOI: 10.1109/icpr56361.2022.9956030

Get Full-Text PDF Get Analytical Report

Abstract

Recognizing human-object interactions is challenging due to their spatio-temporal changes. We propose the SpatioTemporal Interaction Transformer-based (STIT) network to reason such changes. Specifically, spatial transformers learn humans \nand objects context at specific frame time. Temporal transformer \nthen learns the relations at a higher level between spatial \ncontext representations at different time steps, capturing longterm dependencies across frames. We further investigate multiple \nhierarchy designs in learning human interactions. We achieved \nsuperior performance on Charades, Something-Something v1 \nand CAD-120 datasets, comparing to baseline models without \nlearning human-object relations, or with prior graph-based \nnetworks. We also achieved state-of-the-art accuracy of 95.93% \non CAD-120 dataset [1] by employing RGB data only.

Keywords:

Computer science Transformer Artificial intelligence RGB color model Spatial contextual awareness Graph Context model Pattern recognition (psychology) Machine learning Object (grammar) Computer vision Theoretical computer science Engineering

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.16

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Video Surveillance and Tracking Methods

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

STIT: Spatio-Temporal Interaction Transformers for Human-Object Interaction Recognition in Videos

Abstract

Metrics

Topics

Related Documents

Spatio-Temporal Interaction Graph Parsing Networks for Human-Object Interaction Recognition

Temporal Object Detection in Videos Using Spatio-Temporal Transformers

Human Interaction Recognition Using Spatio-Temporal Words

Spatio-temporal Human-Object Interactions for Action Recognition in Videos

Exploring Spatio–Temporal Graph Convolution for Video-Based Human–Object Interaction Recognition