JOURNAL ARTICLE

Self-Supervised Spatiotemporal Representation Learning for Skeleton-Based Human Action Recognition

Jinhyeok ParkSeoung Bum Kim

Year: 2025 Journal:   IEEE Access Vol: 13 Pages: 58164-58174   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Skeleton-based human action recognition (HAR) plays an important role in video analytics and recognition systems, with the goal of accurately identifying human actions in videos. However, large-scale action annotation is costly, which has led to the growing interest in HAR research using self-supervised learning (SSL). While existing SSL studies have focused on extracting global information from skeleton sequences, they often overlook local information that captures the relationships between joints and their subtle movements over time. In this study, we propose an SSL-based HAR framework called coarse-to-fine spatiotemporal representation masking (CFSEM) that effectively learns global, local, and temporal information within skeletal. CFSEM captures not only global information in the skeleton using body- and part-level masking but also fine-grained movements using hand masking. In addition, temporal-axis shuffling is introduced into the proposed framework to account for temporal patterns inherent in skeleton sequences. To further enhance the learning process, the loss function is redefined using a cross-correlation matrix, introducing a non-contrastive SSL approach. Experiments on various datasets were conducted to evaluate the proposed framework against baseline methods. Experimental results showed the superior performance of CFSEM and highlighted the possibility of training HAR models using less labeled data, offering the potential to effectively develop HAR models for various industries.

Keywords:
Computer science Action recognition Skeleton (computer programming) Artificial intelligence Pattern recognition (psychology) Representation (politics) Human skeleton Feature learning Computer vision Class (philosophy)

Metrics

1
Cited By
4.77
FWCI (Field Weighted Citation Impact)
26
Refs
0.82
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Anomaly Detection Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
Gait Recognition and Analysis
Physical Sciences →  Engineering →  Biomedical Engineering

Related Documents

JOURNAL ARTICLE

Adaptive Spatiotemporal Representation Learning for Skeleton-Based Human Action Recognition

Jiahui YuHongwei GaoYongquan ChenDalin ZhouJinguo LiuZhaojie Ju

Journal:   IEEE Transactions on Cognitive and Developmental Systems Year: 2021 Vol: 14 (4)Pages: 1654-1665
JOURNAL ARTICLE

Spatiotemporal consistency enhancement self-supervised representation learning for action recognition

Shuai BiZhengping HuMengyao ZhaoShufang LiZhe Sun

Journal:   Signal Image and Video Processing Year: 2022 Vol: 17 (4)Pages: 1485-1492
JOURNAL ARTICLE

Self-Supervised Representation Learning for Skeleton-Based Group Activity Recognition

Cunling BianWei FengSong Wang

Journal:   Proceedings of the 30th ACM International Conference on Multimedia Year: 2022 Pages: 5990-5998
© 2026 ScienceGate Book Chapters — All rights reserved.