Anakhi HazarikaSoumyajit PoddarHafizur Rahaman
Convolutional Neural Network (CNN) is a type of deep neural networks that are commonly used for object detection and classification. State-of-the-art hardware for training and inference of CNN architectures require a considerable amount of computation and memory intensive resources. CNN achieves greater accuracy at the cost of high computational complexity and large power consumption. To optimize the memory requirement, processing speed and power, it is crucial to design more efficient accelerator architecture for CNN computation. In this work, an overlap of spatially adjacent data is exploited in order to parallelize the movement of data. A fast, re-configurable hardware accelerator architecture along with optimized kernel design suitable for a variety of CNN models is proposed. Our design achieves 2.1x computational benefits over state-of the-art accelerator architectures.
Neha RaniMeghana. CHReddyFhc TiviveA BouzerdoumH KhalajzadehM MansouriM TeshnehlabD StriglK KoflerS PodlipnigY LecunL BottouY BengioHaffnerJ FanX WeiW YingG YihongH LianB LuM NazirM IshtiaqA BatoolM JaffarA MirzaB GolombD LawrenceT SejnowskiZ SunX YuanG BebisS LouisF MamaletC GarciaM YapC JiangXD YangP JonesW.-Y ZhaoK KayaR EidingerG LeviT HasnerS ChristianH Kaiming
Chu YuZhangshuan HouHongsheng ChenJian Wan
Vivienne SzeYu‐Hsin ChenTien-Ju YangJoel Emer
Vivienne SzeYu‐Hsin ChenTien-Ju YangJoel Emer
Joseph Peter.VR. AnithaS. AnusooyaP. K. JawaharE. NitheshS. SairamsivaS. K.