JOURNAL ARTICLE

Adaptive Hierarchy-Branch Fusion for Online Knowledge Distillation

Linrui GongShaohui LinBaochang ZhangYunhang ShenKe LiRuizhi QiaoBo RenMu‐Qing LiYu ZhouLizhuang Ma

Year: 2023 Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Vol: 37 (6)Pages: 7731-7739   Publisher: Association for the Advancement of Artificial Intelligence

Abstract

Online Knowledge Distillation (OKD) is designed to alleviate the dilemma that the high-capacity pre-trained teacher model is not available. However, the existing methods mostly focus on improving the ensemble prediction accuracy from multiple students (a.k.a. branches), which often overlook the homogenization problem that makes student model saturate quickly and hurts the performance. We assume that the intrinsic bottleneck of the homogenization problem comes from the identical branch architecture and coarse ensemble strategy. We propose a novel Adaptive Hierarchy-Branch Fusion framework for Online Knowledge Distillation, termed AHBF-OKD, which designs hierarchical branches and adaptive hierarchy-branch fusion module to boost the model diversity and aggregate complementary knowledge. Specifically, we first introduce hierarchical branch architectures to construct diverse peers by increasing the depth of branches monotonously on the basis of target branch. To effectively transfer knowledge from the most complex branch to the simplest target branch, we propose an adaptive hierarchy-branch fusion module to create hierarchical teacher assistants recursively, which regards the target branch as the smallest teacher assistant. During the training, the teacher assistant from the previous hierarchy is explicitly distilled by the teacher assistant and the branch from the current hierarchy. Thus, the important scores to different branches are effectively and adaptively allocated to reduce the branch homogenization. Extensive experiments demonstrate the effectiveness of AHBF-OKD on different datasets, including CIFAR-10/100 and ImageNet 2012. For example, on ImageNet 2012, the distilled ResNet-18 achieves Top-1 error of 29.28\%, which significantly outperforms the state-of-the-art methods. The source code is available at https://github.com/linruigong965/AHBF.

Keywords:
Computer science Hierarchy Artificial intelligence Class hierarchy Bottleneck Distillation Machine learning Programming language

Metrics

10
Cited By
1.44
FWCI (Field Weighted Citation Impact)
57
Refs
0.80
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Brain Tumor Detection and Classification
Life Sciences →  Neuroscience →  Neurology
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Many-objective evolutionary self-knowledge distillation with adaptive branch fusion method

Jiayuan BaiYi Zhang

Journal:   Information Sciences Year: 2024 Vol: 669 Pages: 120586-120586
JOURNAL ARTICLE

Diversified branch fusion for self-knowledge distillation

Zuxiang LongFuyan MaBin SunMingkui TanShutao Li

Journal:   Information Fusion Year: 2022 Vol: 90 Pages: 12-22
JOURNAL ARTICLE

Layer-fusion for online mutual knowledge distillation

Gan HuYanli JiXingzhu LiangYuexing Han

Journal:   Multimedia Systems Year: 2022 Vol: 29 (2)Pages: 787-796
© 2026 ScienceGate Book Chapters — All rights reserved.