Pedestrians are vulnerable on streets and their actions serve as important cues for motion prediction to avoid collisions. In this paper, we address the problem of pedestrian action recognition for the first time. We first introduce a new dataset, namely, the pedestrian action recognition dataset (PARD), which serves as a database for experiments. Then, we provide an efficient baseline method, MFVGG, reaching comparable performance to previous methods at lower costs. To better handle the canonical problem, we further improve the baseline from the following two aspects: first, we leverage the pose prior to enrich the feature representations; second, we propose a two-stream neural architecture search (NAS) method to obtain the optimal network architecture tailored to our task. From the experimental results on PARD, our method outperforms previous top-performing action recognition methods. The dataset and code are publicly available at https://github.com/Yankeegsj/PARD
Zixuan WangAichun ZhuFangqiang HuQianyu WuYifeng Li
Wei PengXiaopeng HongGuoying Zhao
Benyue SuPeng ZhangManzhen SunMin Sheng
Wei DaiYimin ChenChen HuangMingke GaoXinyu Zhang