Model compression plays a vital role in the deployment of neural networks (NNs) in resource-constrained devices. Rule-based conventional NN pruning is sub-optimal due to the enormous design space that cannot be examined entirely by hand. To overcome this issue, automated NN pruning leverages a reinforcement learning agent to automatically find the best combination of parameters to be removed from a given model. We propose a novel RL-based automated pruning algorithm that, unlike existing RL-based methods, determines the environmental variables using a State Predictor Network as a simulated environment instead of validating the pruned model in run time. Testing our method on the YOLOv4 detector, a model with 49 % sparsity was produced with 7.2 % higher mAP. This result outperforms our handcrafted pruning methods for YOLOv4 by 2.3 % mAP and 17.1 % sparsity. Regarding total development time, our method is 146.2 times faster than the state-of-the-art PuRL method using NVIDIA Titan X GPU. The implementation of the proposed solution is available at: https://github.com/bencsikb/Efficient_RLPruning
Shehryar MalikMuhammad Umair HaiderOmer IqbalMurtaza Taj
Weishan ZhangJiakai WangYuming NieHongwei ZhaoYuru LiuHaoyun SunTao ChenBaoyu Zhang
Dieter BalemansThomas HuybrechtsJan SteckelSiegfried Mercelis
Wenchuan YangHaoran YuBaojiang CuiRunqi SuiTianyu Gu