JOURNAL ARTICLE

Dense Semantics-Assisted Networks for Video Action Recognition

Haonan LuoGuosheng LinYazhou YaoZhenmin TangQingyao WuXian‐Sheng Hua

Year: 2021 Journal:   IEEE Transactions on Circuits and Systems for Video Technology Vol: 32 (5)Pages: 3073-3084   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Most existing action recognition approaches directly leverage the video-level features to recognize human actions from videos. Although these methods have made remarkable progress, the accuracy is still unsatisfied. When the test video involves complex backgrounds and activities, existing methods usually suffer from a significant drop in accuracy. Human action is inherently a high-level concept. Merely applying a video classification model without a detailed semantic understanding of the video content, e.g., objects, scene context, object motions, object interactions, is inadequate to tackle the challenges for action recognition. Fine-level semantic understanding of videos generates elementary semantic concepts from the raw video data, such as the semantics of objects and background regions. It can be employed to bridge the gap between the raw video data and the high-level concept of human actions. In this work, we leverage dense semantic segmentation masks, which encode rich semantic details, provide extra information for the network training, and improve the performance of action recognition. We propose a novel deep architecture which is named as Dense Semantics-Assisted Convolutional Neural Networks (DSA-CNNs) to effectively utilize dense semantic information of video by a bottom-up attention way in the spatial stream, while by the way of branch fusion in the temporal stream. To verify the effectiveness of our approach, we conduct extensive experiments on publicly available datasets – UCF101, HMDB51, and Kinetics. The experimental results demonstrate that our approach substantially improves existing methods and achieves very competitive performance. It also shows that our approach is superior to other related methods that utilize extra information for action recognition.

Keywords:
Computer science Leverage (statistics) Artificial intelligence Convolutional neural network Semantics (computer science) Segmentation ENCODE Action recognition Semantic gap Machine learning Pattern recognition (psychology) Image retrieval Image (mathematics) Class (philosophy)

Metrics

38
Cited By
2.45
FWCI (Field Weighted Citation Impact)
82
Refs
0.90
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Anomaly Detection Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
Hand Gesture Recognition Systems
Physical Sciences →  Computer Science →  Human-Computer Interaction

Related Documents

JOURNAL ARTICLE

Dense Dilated Network for Video Action Recognition

Baohan XuHao YeYingbin ZhengHeng WangTianyu LuwangYu–Gang Jiang

Journal:   IEEE Transactions on Image Processing Year: 2019 Vol: 28 (10)Pages: 4941-4953
JOURNAL ARTICLE

Hierarchical Semantics Interaction for Compressed Video Action Recognition

Jinxin GuoYang YangHuaiwen ZhangS-B QianChunqiang Xu

Journal:   IEEE Transactions on Circuits and Systems for Video Technology Year: 2026 Pages: 1-1
BOOK-CHAPTER

A Video Action Recognition Model Guided by Temporal Action Semantics

Ze-Xu JiJinqu Zhang

Lecture notes in computer science Year: 2025 Pages: 321-335
JOURNAL ARTICLE

Dense Network for Action Recognition from Video Snippets

Journal:   Journal of Xidian University Year: 2020 Vol: 14 (8)
© 2026 ScienceGate Book Chapters — All rights reserved.