Autoregressive convolutional neural networks (CNNs) have been widely\nexploited for sequence generation tasks such as audio synthesis, language\nmodeling and neural machine translation. WaveNet is a deep autoregressive CNN\ncomposed of several stacked layers of dilated convolution that is used for\nsequence generation. While WaveNet produces state-of-the art audio generation\nresults, the naive inference implementation is quite slow; it takes a few\nminutes to generate just one second of audio on a high-end GPU. In this work,\nwe develop the first accelerator platform~\\textit{FastWave} for autoregressive\nconvolutional neural networks, and address the associated design challenges. We\ndesign the Fast-Wavenet inference model in Vivado HLS and perform a wide range\nof optimizations including fixed-point implementation, array partitioning and\npipelining. Our model uses a fully parameterized parallel architecture for fast\nmatrix-vector multiplication that enables per-layer customized latency\nfine-tuning for further throughput improvement. Our experiments comparatively\nassess the trade-off between throughput and resource utilization for various\noptimizations. Our best WaveNet design on the Xilinx XCVU13P FPGA that uses\nonly on-chip memory, achieves 66 faster generation speed compared to CPU\nimplementation and 11 faster generation speed than GPU implementation.\n
Jiao LiCheng LuoWei CaoXuegong ZhouLingli Wang
Mengxing ZhaoXiang LiShunyi ZhuZhou Li
Hemkant NeheteGaurav VermaShailendra Singh YadavBrajesh Kumar Kaushik