Hui LuoJianghao RaoJianlin Zhang
As one of the mainstream model compression techniques, network pruning has received wide attention. Existing network pruning is mainly divided into weight pruning, channel pruning, and other new types. While weight pruning usually implies the highest compression ratio, it leads to high index storage and decoding overhead costs due to the irregular structure. Channel pruning, as a structured pruning method, has been widely studied. However, channel pruning tends to cause more residuals of redundant parameters. To make the pruned model achieve a better trade-off between compression ratio and regularly sparse, in this paper, we propose a novel type of network pruning named block-wise pruning (BWP). Inspired by group convolution, BWP performs pruning from a new granularity, which is beneficial for identifying and removing more residuals. Moreover, we propose a new interval-constrained penalty term, given a pruning threshold, which can make more parameters sparse, resulting in a higher compression ratio. We evaluate the effectiveness of our method on popular benchmark datasets. Compared with some state-of-the-art methods, our method shows obvious superiority. For example, with VGG-16, we achieve an 86.48% FLOPs reduction by removing 95.86% of the parameters, with only a small loss of 0.30% in accuracy on CIFAR-10. With ResNet-34, we achieve an 82.27% FLOPs reduction by removing 58.60% of the parameters, with only a loss of 1.95% in the top-1 accuracy and 1.47% in the top-5 accuracy on ImageNet.
Niange YuShi QiuXiaolin HuJianmin Li
Koji KAMMASarimu INOUEToshikazu Wada
Tiago PeresAna GonçalvesMário Véstias
Mingbao LinYuxin ZhangYuchao LiBohong ChenFei ChaoMengdi WangShen LiYonghong TianRongrong Ji