JOURNAL ARTICLE

Two-stage optimization based on heterogeneous branch fusion for knowledge distillation

Abstract

Knowledge distillation transfers knowledge from the teacher model to the student model, effectively improving the performance of the student model. However, relying solely on the fixed knowledge of the teacher model for guidance lacks the supplementation and expansion of knowledge, which limits the generalization ability of the student model. Therefore, this paper proposes two-stage optimization based on heterogeneous branch fusion for knowledge distillation (THFKD), which provides appropriate knowledge to the student model in different stages through a two-stage optimization strategy. Specifically, the pre-trained teacher offers stable and comprehensive static knowledge, preventing the student from deviating from the target early in the training process. Meanwhile, the student model acquires rich feature representations through heterogeneous branches and a progressive feature fusion module, generating dynamically updated collaborative learning objectives, thus effectively enhancing the diversity of dynamic knowledge. Finally, in the first stage, the ramp-up weight gradually increases the loss weight within the period, while in the second stage, consistent loss weights are applied. The two-stage optimization strategy fully exploits the advantages of each type of knowledge, thereby improving the generalization ability of the student model. Although no tests of statistical significance were carried out, our experimental results on standard datasets (CIFAR-100, Tiny-ImageNet) and long-tail datasets (CIFAR100-LT) suggest that THFKD may slightly improve the student model’s classification accuracy and generalization ability. For instance, using ResNet110-ResNet32 on the CIFAR-100 dataset, the accuracy reaches 75.41%, a 1.52% improvement over the state-of-the-art (SOTA).

Keywords:
Generalization Computer science Distillation Feature (linguistics) Process (computing) Artificial intelligence Machine learning Exploit Stage (stratigraphy) Mathematics

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
36
Refs
0.21
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Machine Learning and Data Classification
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Diversified branch fusion for self-knowledge distillation

Zuxiang LongFuyan MaBin SunMingkui TanShutao Li

Journal:   Information Fusion Year: 2022 Vol: 90 Pages: 12-22
JOURNAL ARTICLE

Adaptive Hierarchy-Branch Fusion for Online Knowledge Distillation

Linrui GongShaohui LinBaochang ZhangYunhang ShenKe LiRuizhi QiaoBo RenMu‐Qing LiYu ZhouLizhuang Ma

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2023 Vol: 37 (6)Pages: 7731-7739
JOURNAL ARTICLE

Two-stage model fusion scheme based on knowledge distillation for stragglers in federated learning

Jiuyun XuXiaowen LiKanjie ZhuLiang YanY. B. Zhao

Journal:   International Journal of Machine Learning and Cybernetics Year: 2024 Vol: 16 (5-6)Pages: 3067-3083
JOURNAL ARTICLE

Radar Self-Evolution Detection: Two-Stage Knowledge Transfer via Distillation–Fusion Synergy

Chuanfei ZangGuolong CuiYumiao WangX. ChenXiang WangYanbin WangXiaobo Yang

Journal:   IEEE Transactions on Aerospace and Electronic Systems Year: 2025 Vol: 61 (6)Pages: 16817-16836
© 2026 ScienceGate Book Chapters — All rights reserved.