Sungwook LeeSeunghyun LeeByung Cheol Song
The latest knowledge distillation (KD) methods have successfully supervised a student model to have a better representation using intermediate layers of a teacher model. However, the previous KD methods did not obtain generalized knowledge for various object scales from a one-stage object detector because the one-stage object detector has a structural property that uses several intermediate layers to extract objects of various scales. In other words, the previous KD methods could not distill and transfer knowledge to intermediate layers of one-stage object detectors in a balanced way. Therefore, we propose a shared knowledge encoder and an averaged prototype transfer to remove or mitigate the distillation and transfer imbalances that adversely affect the KD process. Experimental results show that the proposed KD method outperforms the state-of-the-art methods. For instance, the proposed method provides about 1.3% and 2.2% higher accuracy than the baseline on the PASCAL VOC and MS COCO datasets, respectively.
Wanwei WangWei HongFeng WangJinke Yu
Na DongYongqiang ZhangMingli DingShibiao XuYancheng Bai
Yang LiJiaxin LiHe LiuHuakun Zhang