JOURNAL ARTICLE

Exploring Spatio–Temporal Graph Convolution for Video-Based Human–Object Interaction Recognition

Ning WangGuangming ZhuHongsheng LiMingtao FengXia ZhaoLan NiPeiyi ShenLin MeiLiang Zhang

Year: 2023 Journal:   IEEE Transactions on Circuits and Systems for Video Technology Vol: 33 (10)Pages: 5814-5827   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Video-based human-object interaction recognition is a challenging task since the state of objects as well as their correlations change constantly in the video. Existing methods mainly use 3DCNN or use separate components (e.g., GCN + RNN) to model the spatial correlation or the temporal correlation respectively, but ignore modeling spatio-temporal correlations simultaneously and long-term temporal dynamics of objects. In this paper, we propose a novel model, named Spatio-Temporal Interaction Graph Parsing Networks (STIGPN), for human-object interaction recognition in videos. STIGPN captures both spatial and temporal correlations simultaneously and thus can capture intra-frame and inter-frame dependencies efficiently and effectively. To model long-term temporal dynamics of objects, we introduce spatio-temporal feature enhancement, which can improve the detection of the salient human-object interaction pairs. We explore three types of spatio-temporal graph convolutions to simultaneously capture the spatio-temporal correlations and assess their effectiveness as the basic building block of STIGPN. Extensive experiments on CAD-120, Something-Else and Charades datasets show that our proposed solution leads to competitive results compared with the state-of-the-art methods. Code for STIGPN is available at: https://github.com/NingWang2049/STIGPN2

Keywords:
Computer science Artificial intelligence Graph Pattern recognition (psychology) Salient Parsing Frame (networking) Object (grammar) Computer vision Theoretical computer science

Metrics

16
Cited By
2.91
FWCI (Field Weighted Citation Impact)
68
Refs
0.89
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.