Pruning Pre-trained Language Models with Principled Importance and Self-regularization

Siyu Ren; Kenny Q. Zhu

doi:10.18653/v1/2023.findings-acl.573

ScienceGate Book Chapters

JOURNAL ARTICLE

Pruning Pre-trained Language Models with Principled Importance and Self-regularization

Siyu Ren Kenny Q. Zhu

Year: 2023 Pages: 8995-9008

DOI: 10.18653/v1/2023.findings-acl.573

Get Full-Text PDF Get Analytical Report

Abstract

Iterative pruning is one of the most effective compression methods for pre-trained language models. We discovered that finding the optimal pruning decision is an equality-constrained 0-1 Integer Linear Programming problem. The solution to this optimization problem leads to a principled importance criterion which we use to rank parameters during iterative model pruning. To mitigate the poor generalization at high sparsity levels, we propose a self-regularization scheme where model prediction is regularized by the latest checkpoint with increasing sparsity throughout pruning. Our experiments on natural language understanding, question answering, named entity recognition, and data-to-text generation with various Transformer-based PLMs show the effectiveness of the approach at various sparsity levels.

Keywords:

Pruning Language model Regularization (linguistics) Computer science Artificial intelligence Transformer Machine learning Generalization Scheme (mathematics) Mathematics

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.09

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Pruning Pre-trained Language Models with Principled Importance and Self-regularization

Abstract

Metrics

Topics

Related Documents

Pruning Pre-trained Language Models Without Fine-Tuning

SparseLLM: Towards Global Pruning of Pre-trained Language Models

Structured Pruning for Efficient Generative Pre-trained Language Models

Single-Shot Pruning for Pre-trained Models: Rethinking the Importance of Magnitude Pruning

Prompt Learning with Knowledge Regularization for Pre-trained Vision-Language Models