Spatiotemporal Saliency Representation Learning for Video Action Recognition

Yongqiang Kong; Yunhong Wang; Annan Li

doi:10.1109/tmm.2021.3066775

ScienceGate Book Chapters

JOURNAL ARTICLE

Spatiotemporal Saliency Representation Learning for Video Action Recognition

Yongqiang Kong Yunhong Wang Annan Li

Year: 2021 Journal: IEEE Transactions on Multimedia Vol: 24 Pages: 1515-1528 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/tmm.2021.3066775

Get Full-Text PDF Get Analytical Report

Abstract

Deep convolutional neural networks (CNNs) have achieved great success in human action recognition, however they are still limited in understanding complex and noisy videos owing to the difficulties of exploiting appearance and motion information. Most existing works have been devoted to designing CNN architectures, which overlook the quality of network inputs that is of great importance. This paper provides an alternative solution of action recognition improvement by focusing on the quality of network inputs. A multi-task video salient object detection approach with object-of-interest segmentation scheme, which takes into account both human and action-relevant cues, is proposed to immunize the input video from background clutter. Further, a simple spatiotemporal residual network architecture is presented, which operates on multiple high-quality inputs for long-term action representation learning. Empirical evaluations on various challenging datasets demonstrate that the proposed framework can perform competitively against state-of-the-art. Besides better performance, learning representations of saliency can help prevent the action recognition model from overfitting and speed up the convergence of training.

Keywords:

Computer science Artificial intelligence Overfitting Convolutional neural network Machine learning Action recognition Segmentation Representation (politics) Feature learning Action (physics) Deep learning Clutter Task (project management) Pattern recognition (psychology) Artificial neural network Radar

Metrics

Cited By

2.15

FWCI (Field Weighted Citation Impact)

Refs

0.88

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Spatiotemporal Saliency Representation Learning for Video Action Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Spatiotemporal Saliency for Human Action Recognition

Learning Attention-Enhanced Spatiotemporal Representation for Action Recognition

Learning hierarchical video representation for action recognition

Sparse coding-based spatiotemporal saliency for action recognition

Video Jigsaw: Unsupervised Learning of Spatiotemporal Context for Video Action Recognition