JOURNAL ARTICLE

PET: Parameter-efficient Knowledge Distillation on Transformer

Hyojin JeonSeungcheol ParkJin-Gee KimU Kang

Year: 2023 Journal:   PLoS ONE Vol: 18 (7)Pages: e0288060-e0288060   Publisher: Public Library of Science

Abstract

Given a large Transformer model, how can we obtain a small and computationally efficient model which maintains the performance of the original model? Transformer has shown significant performance improvements for many NLP tasks in recent years. However, their large size, expensive computational cost, and long inference time make it challenging to deploy them to resource-constrained devices. Existing Transformer compression methods mainly focus on reducing the size of the encoder ignoring the fact that the decoder takes the major portion of the long inference time. In this paper, we propose PET (P arameter -E fficient knowledge distillation on T ransformer ), an efficient Transformer compression method that reduces the size of both the encoder and decoder. In PET, we identify and exploit pairs of parameter groups for efficient weight sharing, and employ a warm-up process using a simplified task to increase the gain through Knowledge Distillation. Extensive experiments on five real-world datasets show that PET outperforms existing methods in machine translation tasks. Specifically, on the IWSLT’14 EN→DE task, PET reduces the memory usage by 81.20% and accelerates the inference speed by 45.15% compared to the uncompressed model, with a minor decrease in BLEU score of 0.27.

Keywords:
Transformer Computer science Engineering Electrical engineering Voltage

Metrics

5
Cited By
1.55
FWCI (Field Weighted Citation Impact)
34
Refs
0.79
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Medical Imaging Techniques and Applications
Health Sciences →  Medicine →  Radiology, Nuclear Medicine and Imaging
Machine Learning in Materials Science
Physical Sciences →  Materials Science →  Materials Chemistry
Radiomics and Machine Learning in Medical Imaging
Health Sciences →  Medicine →  Radiology, Nuclear Medicine and Imaging

Related Documents

JOURNAL ARTICLE

Parameter-Efficient and Student-Friendly Knowledge Distillation

Jun RaoXv MengLiang DingShuhan QiXuebo LiuMin ZhangDacheng Tao

Journal:   IEEE Transactions on Multimedia Year: 2023 Vol: 26 Pages: 4230-4241
JOURNAL ARTICLE

TransKD: Transformer Knowledge Distillation for Efficient Semantic Segmentation

Ruiping LiuKailun YangAlina RoitbergJiaming ZhangKunyu PengHuayao LiuYaonan WangRainer Stiefelhagen

Journal:   IEEE Transactions on Intelligent Transportation Systems Year: 2024 Vol: 25 (12)Pages: 20933-20949
JOURNAL ARTICLE

Adaptive class token knowledge distillation for efficient vision transformer

Minchan KangSanghyeok SonDae‐Shik Kim

Journal:   Knowledge-Based Systems Year: 2024 Vol: 304 Pages: 112531-112531
JOURNAL ARTICLE

Parameter-efficient online knowledge distillation for pretrained language models

Yukun WangJin WangXuejie Zhang

Journal:   Expert Systems with Applications Year: 2024 Vol: 265 Pages: 126040-126040
© 2026 ScienceGate Book Chapters — All rights reserved.