JOURNAL ARTICLE

Video Based Action Recognition Using Spatial and Temporal Feature

Abstract

The recognition of actions from video sequences has many applications such as monitoring, assisted living, surveillance, and smart homes. Despite advances in deep learning method, the methodologies to process the video data are still subject to research for that temporal information extraction is still a challenge. In this work, we propose a double stream human action recognition architecture combining both spatial feature stream and temporal feature stream, which provides spatial and temporal feature for the video based action recognition. For the spatial stream, the individual video frames are extracted as the input, while optical flow images were extracted and sent to the deep learning network as input for temporal feature learning. In the experiment, we experimented our proposal on the KTH database and achieved superior results compared the traditional methods. To further improve the recognition accuracy, we experimented fine-tuning mechanism to optimize deep learning network parameters. Furthermore, we introduced the linear SVM to replace softmax classifier to classify the comprehensive feature.

Keywords:
Computer science Softmax function Artificial intelligence Feature extraction Pattern recognition (psychology) Classifier (UML) Optical flow Deep learning Feature (linguistics) Support vector machine Feature learning Image (mathematics)

Metrics

3
Cited By
0.29
FWCI (Field Weighted Citation Impact)
16
Refs
0.59
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.