JOURNAL ARTICLE

StochasticFormer: Stochastic Modeling for Weakly Supervised Temporal Action Localization

Haichao ShiXiaoyu ZhangChangsheng Li

Year: 2023 Journal:   IEEE Transactions on Image Processing Vol: 32 Pages: 1379-1389   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Weakly supervised temporal action localization (WS-TAL) aims to identify the time intervals corresponding to actions of interest in untrimmed videos with video-level weak supervision. For most existing WS-TAL methods, two commonly encountered challenges are under-localization and over-localization, which inevitably bring about severe performance deterioration. To address the issues, this paper proposes a transformer-structured stochastic process modeling framework, namely StochasticFormer, to fully investigate finer-grained interactions among the intermediate predictions to achieve further refined localization. StochasticFormer is built on a standard attention-based pipeline to derive preliminary frame/snippet-level predictions. Then, the pseudo localization module generates variable-length pseudo action instances with the corresponding pseudo labels. Using the pseudo "action instance - action category" pairs as fine-grained pseudo supervision, the stochastic modeler aims to learn the underlying interaction among the intermediate predictions with an encoder-decoder network. The encoder consists of the deterministic and latent path to capture the local and global information, which are subsequently integrated by the decoder to obtain reliable predictions. The framework is optimized with three carefully designed losses, i.e. the video-level classification loss, the frame-level semantic coherence loss, and the ELBO loss. Extensive experiments on two benchmarks, i.e., THUMOS14 and ActivityNet1.2, have shown the efficacy of StochasticFormer compared with the state-of-the-art methods.

Keywords:
Computer science Encoder Artificial intelligence Frame (networking) Snippet Pattern recognition (psychology) Stochastic process Machine learning Data mining Mathematics Statistics

Metrics

13
Cited By
2.37
FWCI (Field Weighted Citation Impact)
53
Refs
0.86
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Diabetic Foot Ulcer Assessment and Management
Health Sciences →  Medicine →  Endocrinology, Diabetes and Metabolism
Advanced Vision and Imaging
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Modeling Sub-Actions for Weakly Supervised Temporal Action Localization

Linjiang HuangYan HuangWanli OuyangLiang Wang

Journal:   IEEE Transactions on Image Processing Year: 2021 Vol: 30 Pages: 5154-5167
JOURNAL ARTICLE

Dynamic Graph Modeling for Weakly-Supervised Temporal Action Localization

Haichao ShiXiaoyu ZhangChangsheng LiLixing GongYong LiYongjun Bao

Journal:   Proceedings of the 30th ACM International Conference on Multimedia Year: 2022 Pages: 3820-3828
JOURNAL ARTICLE

ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization

Bo HeXitong YangLe KangZhiyu ChengXin ZhouAbhinav Shrivastava

Journal:   2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Year: 2022 Pages: 13915-13925
JOURNAL ARTICLE

Temporal Dropout for Weakly Supervised Action Localization

Chi XieZikun ZhuangShengjie ZhaoShuang Liang

Journal:   ACM Transactions on Multimedia Computing Communications and Applications Year: 2022 Vol: 19 (3)Pages: 1-24
© 2026 ScienceGate Book Chapters — All rights reserved.