The attention and utilization of big models have become a trend recently. From large-scale convolutional neural network (CNN) models to vision transformers (ViTs) which contain much more parameters, more and more big models are being adopted in every kind of task. Accuracy requirement is a major reason for using large models and the rapid development of computing hardware, which enables training for large-scale models, also stimulates further demanding of model accuracy reversely. But for mobile or edge computing, there is an inevitable trade-off between accuracy and real-time inference ability due to limited computing power on the terminal end. To make this trade-off more favourable for us, we proposed Knowledge Distillation with Channel Dropout Strategy (CDKD), which applies both the intermediate distillation and final distillation associated with the channel dropout strategy, managing to promote the accuracy of a small model while maintaining the same amount of parameters and FLOPs. In the experiments on the COCO2017 validation set, with same setup, ResNet-18 model training with our method outperfomed the baseline model.
Yang ZhouXiaofeng GuRong FuNa LiXuemei DuPing Kuang
Jiu YiHaoyuan LiuHiroshi Watanabe
Wei Herng YapRui CaoSim Kuan Goh