Large-scale pre-trained models bring significant gains to many speech-related tasks. However, it is still challenging to use these large models when computing power of terminal equipment is limited. Pruning is an effective method to reduce memory footprint and cost calculation. The imperfect evaluation criteria of existing pruning methods and the complex fine tuning process result in a relatively high loss of accuracy. To solve these problems, we propose a structured pruning method, which introduced the upper confidence bound of importance scores to assess the potential of each component of the model more accurately. In addition, we also introduce a set of learnable pruning threshold parameters that can be learned via stochastic gradient descent, thereby reducing the hyper-parameter tuning. We apply our method to HuBERT models on automatic speech recognition (ASR) task. Our result shows that for all pruning granularity and pruning ratios, our method yields higher accuracy and speedup ratios in the inference process.When sparsity was 60%, our method performed only 0.63% down.
Chaofan TaoLu HouHaoli BaiJiansheng WeiXin JiangQun LiuPing LuoNgai Wong
Gongfan FangGreg HeinrichJan KautzPavlo MolchanovSaurav MuralidharanJeff PoolXinchao WangHongxu Yin
Yanyang LiFuli LuoRunxin XuSongfang HuangFei HuangLiwei Wang
Yanyang LiFuli LuoRunxin XuSongfang HuangFei HuangLiwei Wang
Yanyang LiFuli LuoRunxin XuSongfang HuangFei HuangLiwei Wang