JOURNAL ARTICLE

Spatiotemporal Saliency Representation Learning for Video Action Recognition

Yongqiang KongYunhong WangAnnan Li

Year: 2021 Journal:   IEEE Transactions on Multimedia Vol: 24 Pages: 1515-1528   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Deep convolutional neural networks (CNNs) have achieved great success in human action recognition, however they are still limited in understanding complex and noisy videos owing to the difficulties of exploiting appearance and motion information. Most existing works have been devoted to designing CNN architectures, which overlook the quality of network inputs that is of great importance. This paper provides an alternative solution of action recognition improvement by focusing on the quality of network inputs. A multi-task video salient object detection approach with object-of-interest segmentation scheme, which takes into account both human and action-relevant cues, is proposed to immunize the input video from background clutter. Further, a simple spatiotemporal residual network architecture is presented, which operates on multiple high-quality inputs for long-term action representation learning. Empirical evaluations on various challenging datasets demonstrate that the proposed framework can perform competitively against state-of-the-art. Besides better performance, learning representations of saliency can help prevent the action recognition model from overfitting and speed up the convergence of training.

Keywords:
Computer science Artificial intelligence Overfitting Convolutional neural network Machine learning Action recognition Segmentation Representation (politics) Feature learning Action (physics) Deep learning Clutter Task (project management) Pattern recognition (psychology) Artificial neural network Radar

Metrics

24
Cited By
2.15
FWCI (Field Weighted Citation Impact)
83
Refs
0.88
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.