This work presents a hardware-aware Neural Architecture Search (NAS) framework for video-based human action recognition, targeting real-time deployment on FPGAbased System-on-Chip (SoC) platforms. The proposed method explores a constrained search space of Convolutional Neural Network (CNN)-Recurrent Neural Network (RNN) architectures aligned with a hardware-software pipeline where CNNs are mapped to FPGA Deep Learning Processing Units (DPUs) and RNNs to embedded ARM cores. A reinforcement learning (RL)-based controller, guided by a position-based discounted reward strategy, progressively learns to generate architectures that emphasize high-impact design decisions. Experiments on the UCF101 dataset demonstrate that the proposed architectures achieve 81.07 % accuracy, among the highest reported for CNNRNN models relying exclusively on spatial information. The results validate the effectiveness of the proposed framework in driving hardware-compatible and performance-optimized architecture exploration.
Wei PengXiaopeng HongGuoying Zhao
Kangyong YinZhechen HuangSuo QiuZhi WangFengbo TaoWei LiangHaosheng Huang
Shaojin DingTianlong ChenXinyu GongWeiwei ZhaShuicheng Yan