Cheng DaiXingang LiuLuhao ZhongTao Yu
The recognition of actions from video sequences has many applications such as monitoring, assisted living, surveillance, and smart homes. Despite advances in deep learning method, the methodologies to process the video data are still subject to research for that temporal information extraction is still a challenge. In this work, we propose a double stream human action recognition architecture combining both spatial feature stream and temporal feature stream, which provides spatial and temporal feature for the video based action recognition. For the spatial stream, the individual video frames are extracted as the input, while optical flow images were extracted and sent to the deep learning network as input for temporal feature learning. In the experiment, we experimented our proposal on the KTH database and achieved superior results compared the traditional methods. To further improve the recognition accuracy, we experimented fine-tuning mechanism to optimize deep learning network parameters. Furthermore, we introduced the linear SVM to replace softmax classifier to classify the comprehensive feature.
Iveel JargalsaikhanSuzanne LittleRémi TrichetNoel E. O’Connor
Rongsen WuJie XuYuhang ZhangZixuan LiYiyao LiShixue Cheng