Fine-grained action recognition poses a significant challenge as it involves distinguishing subtle and distinctive motion variations within fine-grained action categories. Previous methods have improved performance by enhancing the network's ability to capture variations in the action space and temporal changes. However, they face limitations in effectively distinguishing subtle differences within actions when they involve varying numbers of repetitions. In this paper, we propose an effective module called the Self-similarity Attention Module (SAM). This module represents the self-similarity of actions using the Temporal Self-similarity Matrix (TSM) and utilizes channel-wise excitation to capture the periodicity information in actions. The Self-similarity Attention Module can be embedded into any 3D convolutional neural network. Our approach outperforms previous skeleton-based action recognition methods on the wildly used FineGym dataset, which confirms its effectiveness and efficiency.
Xiang LiShenglan LiuYunheng LiHao LiuJinjing ZhaoLin FengGuihong LaoGuangzhe Li
Pengyuan HanZhongli MaJiajia Liu
Dandan ZhangSicong ZhanJia WangZhiqing Guo
Zhenghao XieJunfen ChenYingying WangBojun Xie
Kaiyuan LiuYunheng LiYuanfeng XuShuai LiuShenglan Liu