JOURNAL ARTICLE

Stacking-Based Attention Temporal Convolutional Network for Action Segmentation

Abstract

Action segmentation plays an important role in video understanding, which is implemented by frame-wise action classification. Recent works on action segmentation capture long-term dependencies by increasing temporal convolution layers in Temporal Convolution Networks (TCNs). However, high layers in TCNs are more coarse access to video features, resulting in the loss of fine-grained information for frame-wise action classification. To address the above issues, we propose a novel Attention-based Temporal Convolution (ATC) block to capture fine-grained information of temporal dependencies for frame-wise action classification by self-attention mechanism. Via stacking ATC blocks, we design a Stacking-based Attention Temporal Convolutional Network (SATC) to adaptively capture long-term and short-term dependencies, according to the semantic similarity of features on different temporal receptive fields simultaneously. The experimental results demonstrate that our SATC outperforms other baselines on all three challenging datasets: GTEA, 50Salads and Breakfast.

Keywords:
Computer science Convolution (computer science) Segmentation Artificial intelligence Frame (networking) Block (permutation group theory) Stacking Similarity (geometry) Pattern recognition (psychology) Image (mathematics)

Metrics

4
Cited By
0.73
FWCI (Field Weighted Citation Impact)
25
Refs
0.65
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Diabetic Foot Ulcer Assessment and Management
Health Sciences →  Medicine →  Endocrinology, Diabetes and Metabolism
Anomaly Detection Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.