Adaptive Hierarchy-Branch Fusion for Online Knowledge Distillation

Linrui Gong; Shaohui Lin; Baochang Zhang; Yunhang Shen; Ke Li; Ruizhi Qiao; Bo Ren; Mu‐Qing Li; Yu Zhou; Lizhuang Ma

doi:10.1609/aaai.v37i6.25937

ScienceGate Book Chapters

JOURNAL ARTICLE

Adaptive Hierarchy-Branch Fusion for Online Knowledge Distillation

Linrui Gong Shaohui Lin Baochang Zhang Yunhang Shen Ke Li Ruizhi Qiao Bo Ren Mu‐Qing Li Yu Zhou Lizhuang Ma

Year: 2023 Journal: Proceedings of the AAAI Conference on Artificial Intelligence Vol: 37 (6)Pages: 7731-7739 Publisher: Association for the Advancement of Artificial Intelligence

DOI: 10.1609/aaai.v37i6.25937

Get Full-Text PDF Get Analytical Report

Abstract

Online Knowledge Distillation (OKD) is designed to alleviate the dilemma that the high-capacity pre-trained teacher model is not available. However, the existing methods mostly focus on improving the ensemble prediction accuracy from multiple students (a.k.a. branches), which often overlook the homogenization problem that makes student model saturate quickly and hurts the performance. We assume that the intrinsic bottleneck of the homogenization problem comes from the identical branch architecture and coarse ensemble strategy. We propose a novel Adaptive Hierarchy-Branch Fusion framework for Online Knowledge Distillation, termed AHBF-OKD, which designs hierarchical branches and adaptive hierarchy-branch fusion module to boost the model diversity and aggregate complementary knowledge. Specifically, we first introduce hierarchical branch architectures to construct diverse peers by increasing the depth of branches monotonously on the basis of target branch. To effectively transfer knowledge from the most complex branch to the simplest target branch, we propose an adaptive hierarchy-branch fusion module to create hierarchical teacher assistants recursively, which regards the target branch as the smallest teacher assistant. During the training, the teacher assistant from the previous hierarchy is explicitly distilled by the teacher assistant and the branch from the current hierarchy. Thus, the important scores to different branches are effectively and adaptively allocated to reduce the branch homogenization. Extensive experiments demonstrate the effectiveness of AHBF-OKD on different datasets, including CIFAR-10/100 and ImageNet 2012. For example, on ImageNet 2012, the distilled ResNet-18 achieves Top-1 error of 29.28\%, which significantly outperforms the state-of-the-art methods. The source code is available at https://github.com/linruigong965/AHBF.

Keywords:

Computer science Hierarchy Artificial intelligence Class hierarchy Bottleneck Distillation Machine learning Programming language

Metrics

Cited By

1.44

FWCI (Field Weighted Citation Impact)

Refs

0.80

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Brain Tumor Detection and Classification

Life Sciences → Neuroscience → Neurology

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Adaptive Hierarchy-Branch Fusion for Online Knowledge Distillation

Abstract

Metrics

Citation History

Topics

Related Documents

Many-objective evolutionary self-knowledge distillation with adaptive branch fusion method

Diversified branch fusion for self-knowledge distillation

Online Knowledge Distillation via Multi-branch Diversity Enhancement

Layer-fusion for online mutual knowledge distillation

Feature Fusion for Online Mutual Knowledge Distillation