Deep neural network inference on energy harvesting tiny devices has emerged as a solution for sustainable edge intelligence. However, compact models optimized for continuously-powered systems may become suboptimal when deployed on intermittently-powered systems. This paper presents the pruning criterion, pruning strategy, and prototype implementation of iPrune, the first framework which introduces intermittency into neural network pruning to produce compact models adaptable to intermittent systems. The pruned models are deployed and evaluated on a Texas Instruments device with various power strengths and TinyML applications. Compared to an energy-aware pruning framework, iPrune can speed up intermittent inference by 1.1 to 2 times while achieving comparable model accuracy.
Ling LiangLei DengYueling Jenny ZengXing HuYu JiXin MaGuoqi LiYuan Xie
Tejalal ChoudharyVipul Kumar MishraAnurag GoswamiS. Jagannathan
Jungwon ChaTaeho KimJemin LeeSangtae HaYongin Kwon
Nan-Fei JiangXu ZhaoChaoyang ZhaoYong-Qi AnMing TangJin-Qiao Wang
Soyoung LeeKyungho KimJonghoon KwakEunChong LeeSang-Seol Lee