In conventional neural network computation, data movement constitutes a substantial part of the dataflow. Rather than primarily relying on time-consuming MAC operations, this study emphasizes data inference within the memory, significantly reducing the data volumes that must be repeatedly read by computational units, which often imposes a processing burden on the network. The hardware architecture in this study is purposefully designed to minimize the time consumed by data transfer, achieved through strategic buffer configurations, efficient data transfer between computational units, and meticulous dataflow organization. Notably, this research places a strong emphasis on optimizing data reuse post-reading. The implementation results detail the comprehensive steps of object detection, offering a thorough analysis of the time allocation for each image processing stage. Finally, the study concludes with a comparative evaluation against several well-known papers in the field.
Junye SiJianfei JiangQin WangJia Huang