Olexa BilaniukSean WagnerYvon SavariaJean‐Pierre David
Deep Neural Networks (DNNs) become the state-of-the-art in several domains such as computer vision or speech recognition. However, using DNNs for embedded applications is still strongly limited because of their complexity and the energy required to process large data sets. In this paper, we present the architecture of an accelerator for quantized neural networks and its implementation on a Nallatech 385-A7 board with an Altera Stratix V GX A7 FPGA. The accelerator's design centers around the matrix-vector product as the key primitive, and exploits bit-slicing to extract maximum performance using low-precision arithmetic.
Jiaming HuangJunyan YangSaisai NuiHang YiWei WangHai‐Bao Chen
Muhammad Rifqi Daffa SudrajatTrio AdionoInfall Syafalni
Sungju RyuHyungjun KimWooseok YiEunhwan KimYulhwa KimTaesu KimJae‐Joon Kim