JOURNAL ARTICLE

Detecting video events based on action recognition in complex scenes using spatio-temporal descriptor

Abstract

Event detection plays an essential role in video content analysis and remains a challenging open problem. In particular, the study on detecting human-related video events in complex scenes with both a crowd of people and dynamic motion is still limited. In this paper, we investigate detecting video events that involve elementary human actions, e.g. making cellphone call, putting an object down, and pointing to something, in complex scenes using a novel spatio-temporal descriptor based approach. A new spatio-temporal descriptor, which temporally integrates the statistics of a set of response maps of low-level features, e.g. image gradients and optical flows, in a space-time cube, is proposed to capture the characteristics of actions in terms of their appearance and motion patterns. Based on this kind of descriptors, the bag-of-words method is utilized to describe a human figure as a concise feature vector. Then, these features are employed to train SVM classifiers at multiple spatial pyramid levels to distinguish different actions. Finally, a Gaussian kernel based temporal filtering is conducted to segment the sequences of events from a video stream taking account of the temporal consistency of actions. The proposed approach is capable of tolerating spatial layout variations and local deformations of human actions due to diverse view angles and rough human figure alignment in complex scenes. Extensive experiments on the 50-hour video dataset of TRECVid 2008 event detection task demonstrate that our approach outperforms the well-known SIFT descriptor based methods and effectively detects video events in challenging real-world conditions.

Keywords:
Computer science Action recognition Artificial intelligence Computer vision Pattern recognition (psychology) Action (physics)

Metrics

57
Cited By
8.06
FWCI (Field Weighted Citation Impact)
42
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

BoF based Action Recognition using Spatio-Temporal 2D Descriptor

Jin-Ok Kim

Journal:   Journal of Internet Computing and services Year: 2015 Vol: 16 (3)Pages: 21-32
JOURNAL ARTICLE

Spatio-Temporal Difference Descriptor for Skeleton-Based Action Recognition

Chongyang DingKai LiuJari KorhonenEvgeny Belyaev

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2021 Vol: 35 (2)Pages: 1227-1235
JOURNAL ARTICLE

Action Recognition based on Video Spatio-Temporal Transformer

Mingyang QiaoTiantian Yuan

Journal:   2022 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA) Year: 2022 Vol: 1 Pages: 477-481
© 2026 ScienceGate Book Chapters — All rights reserved.