JOURNAL ARTICLE

Temporal and Semantic Correlation Network for Weakly-Supervised Temporal Action Localization

Kang LinWei ZhouZhijie ZhengDihu ChenTao Su

Year: 2025 Journal:   ACM Transactions on Multimedia Computing Communications and Applications Vol: 21 (5)Pages: 1-23   Publisher: Association for Computing Machinery

Abstract

Weakly-Supervised Temporal Action Localization (WTAL) aims to identify the temporal boundaries and classify actions in untrimmed videos using only video-level labels during training. Despite recent progress, many existing approaches primarily follow a localization-by-classification pipeline, treating snippets as independent instances and thus exploiting only limited contextual information. Besides, these methods struggle to capture multi-scale temporal information and neglect both the internal temporal structures within videos and the semantic consistency between videos, resulting in misclassification and inaccurate localization. To address these limitations, we introduce a novel Temporal and Semantic Correlation Network (TSC-Net) for WTAL task, which can be trained end-to-end. First, we propose a Multi-Scale Features Integration Pyramid (MFIP) module to integrate multi-scale temporal features, effectively addressing the challenge of missed detections caused by short action durations. Furthermore, we design a Temporal Correlation Enhancement (TCE) branch to enhance segment correlations by video-level temporal structures to improve the completeness of action localization. Finally, a Dataset-Wide Semantic Awareness (DSA) branch is designed to construct and propagate a dataset-level action semantics bank, enhancing the model’s awareness of semantic consistency in actions. Extensive experiments show that TSC-Net outperforms most existing WTAL methods, achieving an average mAP of 46.3% on the THUMOS-14 dataset and 26.5% on the ActivityNet1.2 dataset. Detailed ablation studies further confirm the effectiveness of each component in our model. The code and models are publicly available at https://github.com/linkang-els/TSC-Net-main .

Keywords:
Computer science Artificial intelligence Correlation Action (physics) Natural language processing

Metrics

2
Cited By
9.55
FWCI (Field Weighted Citation Impact)
67
Refs
0.92
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Anomaly Detection Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
Gait Recognition and Analysis
Physical Sciences →  Engineering →  Biomedical Engineering
© 2026 ScienceGate Book Chapters — All rights reserved.