Deepak KadetotadVisar BerishaChaitali ChakrabartiJae-sun Seo
Long short-term memory (LSTM) networks are widely used for speech applications but pose difficulties for efficient implementation on hardware due to large weight storage requirements. We present an energy-efficient LSTM recurrent neural network (RNN) accelerator, featuring an algorithm-hardware co-optimized memory compression technique called hierarchical coarse-grain sparsity (HCGS). Aided by HCGS-based block-wise recursive weight compression, we demonstrate LSTM networks with up to 16× fewer weights while achieving minimal accuracy loss. The prototype chip fabricated in 65-nm LP CMOS achieves 8.93/7.22 TOPS/W for 2-/3-layer LSTM RNNs trained with HCGS for TIMIT/TED-LIUM corpora.
Deepak KadetotadShihui YinVisar BerishaChaitali ChakrabartiJae-sun Seo
Deepak KadetotadJian MengVisar BerishaChaitali ChakrabartiJae-sun Seo
Khaled HumoodPatrick FosterShiwei WangAlexander SerbThemis Prodromakis
Chen-Han HsuYu-Hsiang ChengZhaofang LiPing-Li HuangKea‐Tiong Tang