JOURNAL ARTICLE

Temporal Action Detection with Fused Two-Stream 3D Residual Neural Networks and Bi-Directional LSTM

Abstract

This work presents an architecture for localizing interesting target events within long sequences of untrimmed videos. Mainly, we focus on finding temporal boundaries of target visual actions and bypassing irrelevant events of other actions. Both the appearance and motion information are crucial for discriminating between different actions. Based on this, we propose a trainable fused two-stream 3D Convolution neural network framework, integrated with a bi-directional Long Short-Term Memory sequence model (2-stream 3DCNN+ LSTM) for learning. The two stream CNN enables us to model features from both RGB and optical flow short video-clips of resolution $\delta=16$ frames, extracted from the long input video sequence. This framework produces a sequence of class probability scores at each video-clip. Simple low-cost mean, average and max filters are used to localize and classify each relevant action instance and to label the whole video. Such architecture utilized the power of (1) two streams CNN architecture, (2) the spatiotemporal processing of 3D convolution network for capturing spatial and motion patterns, (3) temporal orderings and long-range dependencies of the sequence model for obtaining robust classifications at each time step. We evaluate our framework using THUMOS'15 dataset, attaining 98.9% accuracy and 35.8 % mAP in the video level classification and relevant action detection tasks, respectively.

Keywords:
Computer science Artificial intelligence RGB color model Residual Convolutional neural network Convolution (computer science) Pattern recognition (psychology) Optical flow Sequence (biology) Computer vision Feature extraction Deep learning Artificial neural network Image (mathematics) Algorithm

Metrics

2
Cited By
0.00
FWCI (Field Weighted Citation Impact)
30
Refs
0.19
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Anomaly Detection Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.