Iterative pruning is one of the most effective compression methods for pre-trained language models. We discovered that finding the optimal pruning decision is an equality-constrained 0-1 Integer Linear Programming problem. The solution to this optimization problem leads to a principled importance criterion which we use to rank parameters during iterative model pruning. To mitigate the poor generalization at high sparsity levels, we propose a self-regularization scheme where model prediction is regularized by the latest checkpoint with increasing sparsity throughout pruning. Our experiments on natural language understanding, question answering, named entity recognition, and data-to-text generation with various Transformer-based PLMs show the effectiveness of the approach at various sparsity levels.
Ting JiangDeqing WangFuzhen ZhuangRuobing XieFeng Xia
Guangji BaiKibaek KimYijiang LiLing ChenLiang Zhao
Chaofan TaoLu HouHaoli BaiJiansheng WeiXin JiangQun LiuPing LuoNgai Wong
Boyang GuoLiang LiJiehua ZhangYaoqi SunChenggang YanXichun Sheng