JOURNAL ARTICLE

Class Attention Transfer Based Knowledge Distillation

Abstract

Previous knowledge distillation methods have shown their impressive performance on model compression tasks, however, it is hard to explain how the knowledge they transferred helps to improve the performance of the student network. In this work, we focus on proposing a knowledge distillation method that has both high interpretability and competitive performance. We first revisit the structure of mainstream CNN models and reveal that possessing the capacity of identifying class discriminative regions of input is critical for CNN to perform classification. Furthermore, we demonstrate that this capacity can be obtained and enhanced by transferring class activation maps. Based on our findings, we propose class attention transfer based knowledge distillation (CAT-KD). Different from previous KD methods, we explore and present several properties of the knowledge transferred by our method, which not only improve the interpretability of CAT-KD but also contribute to a better understanding of CNN. While having high interpretability, CAT-KD achieves state-of-the-art performance on multiple benchmarks. Code is available at: https://github.com/GzyAftermath/CAT-KD.

Keywords:
Interpretability Computer science Discriminative model Distillation Class (philosophy) Machine learning Artificial intelligence Knowledge transfer Code (set theory) Knowledge management Programming language

Metrics

100
Cited By
18.20
FWCI (Field Weighted Citation Impact)
49
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Adversarial Robustness in Machine Learning
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

BOOK-CHAPTER

Adaptive Multi-teacher Knowledge Distillation with Class Attention Transfer

Xin ChengJinjia Zhou

Communications in computer and information science Year: 2025 Pages: 210-224
JOURNAL ARTICLE

Attention and feature transfer based knowledge distillation

Guoliang YangShuaiying YuYangyang ShengHao Yang

Journal:   Scientific Reports Year: 2023 Vol: 13 (1)Pages: 18369-18369
JOURNAL ARTICLE

Artistic Style Transfer Based on Attention with Knowledge Distillation

Hanadi Al‐MekhlafiShiguang Liu

Journal:   Computer Graphics Forum Year: 2024 Vol: 43 (6)
JOURNAL ARTICLE

Hierarchical Multi-Attention Transfer for Knowledge Distillation

Jianping GouLiyuan SunBaosheng YuShaohua WanDacheng Tao

Journal:   ACM Transactions on Multimedia Computing Communications and Applications Year: 2022 Vol: 20 (2)Pages: 1-20
© 2026 ScienceGate Book Chapters — All rights reserved.