Human action recognition has always been the focus of research in the field of computer vision. Different from the traditional human action recognition methods based on handcrafted features, the deep learning-based human action recognition methods can automatically learn features and hence have been widely concerned in recent years. However, with the increasing application of deep network in human action recognition, the problem of information loss cannot be ignored due to the deepening of convolutional layers, which will eventually affect the performance of recognition. To solve the abovementioned problem, we propose a deep learning-based action recognition method using a multi-level feature fusion mechanism which helps make full use of detailed features of middle layers of CNN network. Convolutional autoencoders are employed to reduce dimension of the middle-layer feature while retaining its representativeness. At the same time, a joint optimization module is designed to reduce the feature redundancy and achieve better recognition performance. Experimental results have shown the superiority of the proposed method, and the average accuracy of action recognition reaches 92.54%.
Yueshen XuGuang-can XIAOXiaofen Tang
Wei SongPei YangNingning LiuGuosheng YangFuhong Lin