Convolutional Neural Networks (CNNs) have demonstrated significant performance in AI (artificial intelligence) systems. However, CNNs often have tens or even hundreds of neural layers with millions of parameters to achieve state-of-the-art performance, which hinders the deployment to some resource limited scenarios. Meanwhile, those parameters and data usually are sparse, which results in useless calculation as well as unbalanced calculation. To solve these problem, we propose a computing efficient hardware architecture. In order to decrease calculating redundancy, we filter zero-valued weights and zero-valued feature maps. To reduce redundant memory consumption, we propose a memory division and a data reuse mechanism. To resolve load imbalance, we implement a near-zero-cost scheduling switching strategy. Experimental results show that our architecture saves, on average, 22.6% memory times and 60.5% computing time over the state-of-the-art NN accelerator.
Yan-jie GuJian YuTieli SunChen PanZhenhao FengLiewei XuChang Wu
Hao XiaoKaikai ZhaoGuangzhu Liu
Xueming LiHongming HuangTaosheng ChenHuaien GaoXianghong HuXiaoming Xiong