JOURNAL ARTICLE

Graph Representation for Weakly-Supervised Spatio-Temporal Action Detection

Abstract

Spatio-temporal action recognition and localization are crucial in several computer vision applications including video surveillance, video captioning to name a few. However, most of the existing action recognition and localization approaches are for offline use, perform well only on trimmed action clips. Also, they need precise annotations at the clip, frame, and pixel levels which is labor-intensive and thus undermines their usage for real-world large-scale scenarios. In this paper, we propose a weakly-supervised spatio-temporal action recognition and localization based on graph representation in untrimmed videos. More specifically, we propose an efficient graph representation of videos using only the clip level annotations, while existing approaches are either supervised or unsupervised learning approach. For graph construction, the local actions are determined based on the key interesting demeanor in an action clip and assigned the class label the same as that of the clip. This weak annotation impacts both action recognition and localization significantly because the local actions have considerable intra-class variability and inter-class similarity. To handle the intra-class variability and inter-class similarity, we use a weakly-supervised deep multiple instance ranking framework on the local action descriptors. To classify a graph of local actions into one of the action classes, we use a support vector machine along with a graph kernel and then localize the recognized action as a non-cubic shaped-portion of the video based on local actions in the graph. The experimental results show that the proposed approach outperforms the state-of-the-art methods on the three benchmark datasets, namely, THUMOS14, UCF-Sports, and JHMDB-21.

Keywords:
Computer science Artificial intelligence Graph Pattern recognition (psychology) Action recognition Machine learning Annotation Class (philosophy) Theoretical computer science

Metrics

2
Cited By
0.36
FWCI (Field Weighted Citation Impact)
52
Refs
0.53
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Anomaly Detection Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
Gait Recognition and Analysis
Physical Sciences →  Engineering →  Biomedical Engineering

Related Documents

JOURNAL ARTICLE

Mask attention-guided graph convolution layer for weakly supervised temporal action detection

Mengyao ZhaoZhengping HuShufang LiShuai BiZhe Sun

Journal:   Multimedia Tools and Applications Year: 2021 Vol: 81 (3)Pages: 4323-4340
JOURNAL ARTICLE

Dynamic Graph Modeling for Weakly-Supervised Temporal Action Localization

Haichao ShiXiaoyu ZhangChangsheng LiLixing GongYong LiYongjun Bao

Journal:   Proceedings of the 30th ACM International Conference on Multimedia Year: 2022 Pages: 3820-3828
JOURNAL ARTICLE

Weakly Supervised Temporal Action Detection With Temporal Dependency Learning

Bairong LiRuixin LiuTianquan ChenYuesheng Zhu

Journal:   IEEE Transactions on Circuits and Systems for Video Technology Year: 2021 Vol: 32 (7)Pages: 4473-4485
© 2026 ScienceGate Book Chapters — All rights reserved.