JOURNAL ARTICLE

Video Action Recognition in SoC FPGAs Driven by Neural Architecture Search

Abstract

This work presents a hardware-aware Neural Architecture Search (NAS) framework for video-based human action recognition, targeting real-time deployment on FPGAbased System-on-Chip (SoC) platforms. The proposed method explores a constrained search space of Convolutional Neural Network (CNN)-Recurrent Neural Network (RNN) architectures aligned with a hardware-software pipeline where CNNs are mapped to FPGA Deep Learning Processing Units (DPUs) and RNNs to embedded ARM cores. A reinforcement learning (RL)-based controller, guided by a position-based discounted reward strategy, progressively learns to generate architectures that emphasize high-impact design decisions. Experiments on the UCF101 dataset demonstrate that the proposed architectures achieve 81.07 % accuracy, among the highest reported for CNNRNN models relying exclusively on spatial information. The results validate the effectiveness of the proposed framework in driving hardware-compatible and performance-optimized architecture exploration.

Keywords:

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

© 2026 ScienceGate Book Chapters — All rights reserved.