Two-stage optimization based on heterogeneous branch fusion for knowledge distillation

Gang Li; Pengfei Lv; Yang Zhang; Chuanyun Xu; Zihan Ruan; Zheng Zhou; Xinyu Fan; Ru Wang; Pan He

doi:10.1371/journal.pone.0326711

ScienceGate Book Chapters

JOURNAL ARTICLE

Two-stage optimization based on heterogeneous branch fusion for knowledge distillation

Gang Li Pengfei Lv Yang Zhang Chuanyun Xu Zihan Ruan Zheng Zhou Xinyu Fan Ru Wang Pan He

Year: 2025 Journal: PLoS ONE Vol: 20 (7)Pages: e0326711-e0326711 Publisher: Public Library of Science

DOI: 10.1371/journal.pone.0326711

Get Full-Text PDF Get Analytical Report

Abstract

Knowledge distillation transfers knowledge from the teacher model to the student model, effectively improving the performance of the student model. However, relying solely on the fixed knowledge of the teacher model for guidance lacks the supplementation and expansion of knowledge, which limits the generalization ability of the student model. Therefore, this paper proposes two-stage optimization based on heterogeneous branch fusion for knowledge distillation (THFKD), which provides appropriate knowledge to the student model in different stages through a two-stage optimization strategy. Specifically, the pre-trained teacher offers stable and comprehensive static knowledge, preventing the student from deviating from the target early in the training process. Meanwhile, the student model acquires rich feature representations through heterogeneous branches and a progressive feature fusion module, generating dynamically updated collaborative learning objectives, thus effectively enhancing the diversity of dynamic knowledge. Finally, in the first stage, the ramp-up weight gradually increases the loss weight within the period, while in the second stage, consistent loss weights are applied. The two-stage optimization strategy fully exploits the advantages of each type of knowledge, thereby improving the generalization ability of the student model. Although no tests of statistical significance were carried out, our experimental results on standard datasets (CIFAR-100, Tiny-ImageNet) and long-tail datasets (CIFAR100-LT) suggest that THFKD may slightly improve the student model’s classification accuracy and generalization ability. For instance, using ResNet110-ResNet32 on the CIFAR-100 dataset, the accuracy reaches 75.41%, a 1.52% improvement over the state-of-the-art (SOTA).

Keywords:

Generalization Computer science Distillation Feature (linguistics) Process (computing) Artificial intelligence Machine learning Exploit Stage (stratigraphy) Mathematics

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.21

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Machine Learning and Data Classification

Physical Sciences → Computer Science → Artificial Intelligence

Two-stage optimization based on heterogeneous branch fusion for knowledge distillation

Abstract

Metrics

Topics

Related Documents

Dual-Branch Teacher Assistant-Based Network for Heterogeneous Knowledge Distillation

Diversified branch fusion for self-knowledge distillation

Adaptive Hierarchy-Branch Fusion for Online Knowledge Distillation

Two-stage model fusion scheme based on knowledge distillation for stragglers in federated learning

Radar Self-Evolution Detection: Two-Stage Knowledge Transfer via Distillation–Fusion Synergy