Class Attention Transfer Based Knowledge Distillation

Ziyao Guo; Haonan Yan; Hui Li; Xiaodong Lin

doi:10.1109/cvpr52729.2023.01142

ScienceGate Book Chapters

JOURNAL ARTICLE

Class Attention Transfer Based Knowledge Distillation

Ziyao Guo Haonan Yan Hui Li Xiaodong Lin

Year: 2023 Pages: 11868-11877

DOI: 10.1109/cvpr52729.2023.01142

Get Full-Text PDF Get Analytical Report

Abstract

Previous knowledge distillation methods have shown their impressive performance on model compression tasks, however, it is hard to explain how the knowledge they transferred helps to improve the performance of the student network. In this work, we focus on proposing a knowledge distillation method that has both high interpretability and competitive performance. We first revisit the structure of mainstream CNN models and reveal that possessing the capacity of identifying class discriminative regions of input is critical for CNN to perform classification. Furthermore, we demonstrate that this capacity can be obtained and enhanced by transferring class activation maps. Based on our findings, we propose class attention transfer based knowledge distillation (CAT-KD). Different from previous KD methods, we explore and present several properties of the knowledge transferred by our method, which not only improve the interpretability of CAT-KD but also contribute to a better understanding of CNN. While having high interpretability, CAT-KD achieves state-of-the-art performance on multiple benchmarks. Code is available at: https://github.com/GzyAftermath/CAT-KD.

Keywords:

Interpretability Computer science Discriminative model Distillation Class (philosophy) Machine learning Artificial intelligence Knowledge transfer Code (set theory) Knowledge management Programming language

Metrics

100

Cited By

18.20

FWCI (Field Weighted Citation Impact)

Refs

0.99

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Adversarial Robustness in Machine Learning

Physical Sciences → Computer Science → Artificial Intelligence

Class Attention Transfer Based Knowledge Distillation

Abstract

Metrics

Citation History

Topics

Related Documents

Adaptive Multi-teacher Knowledge Distillation with Class Attention Transfer

Attention and feature transfer based knowledge distillation

Class-adaptive attention transfer and multilevel entropy decoupled knowledge distillation

Artistic Style Transfer Based on Attention with Knowledge Distillation

Hierarchical Multi-Attention Transfer for Knowledge Distillation