End-to-End, Single-Stream Temporal Action Detection in Untrimmed Videos

Shyamal Buch; Víctor Escorcia; Bernard Ghanem; Juan Carlos Niebles

doi:10.5244/c.31.93

ScienceGate Book Chapters

JOURNAL ARTICLE

End-to-End, Single-Stream Temporal Action Detection in Untrimmed Videos

Shyamal Buch Víctor Escorcia Bernard Ghanem Juan Carlos Niebles

Year: 2017

DOI: 10.5244/c.31.93

Get Full-Text PDF Get Analytical Report

Abstract

In this work, we present a new intuitive, end-to-end approach for temporal action detection in untrimmed videos. We introduce our new architecture for Single-Stream Temporal Action Detection (SS-TAD), which effectively integrates joint action detection with its semantic sub-tasks in a single unifying end-to-end framework. We develop a method for training our deep recurrent architecture based on enforcing semantic constraints on intermediate modules that are gradually relaxed as learning progresses. We find that such a dynamic learning scheme enables SS-TAD to achieve higher overall detection performance, with fewer training epochs. By design, our single-pass network is very efficient and can operate at 701 frames per second, while simultaneously outperforming the state-of-the-art methods for temporal action detection on THUMOS’14.

Keywords:

Computer science Action (physics) Artificial intelligence Pattern recognition (psychology)

Metrics

239

Cited By

13.73

FWCI (Field Weighted Citation Impact)

Refs

0.99

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Anomaly Detection Techniques and Applications

Physical Sciences → Computer Science → Artificial Intelligence

Video Surveillance and Tracking Methods

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

End-to-End, Single-Stream Temporal Action Detection in Untrimmed Videos

Abstract

Metrics

Citation History

Topics

Related Documents

Single-Stage End-to-End Temporal Activity Detection in Untrimmed Videos

Trends in Temporal Action Detection in Untrimmed Videos

Bidirectional Single-Stream Temporal Sentence Query Localization in Untrimmed Videos

Mid-level Fusion for End-to-End Temporal Activity Detection in Untrimmed Video

Temporal Action Detection in Untrimmed Videos from Fine to Coarse Granularity