Structured Pruning for Efficient Generative Pre-trained Language Models

Chaofan Tao; Lu Hou; Haoli Bai; Jiansheng Wei; Xin Jiang; Qun Liu; Ping Luo; Ngai Wong

doi:10.18653/v1/2023.findings-acl.692

ScienceGate Book Chapters

JOURNAL ARTICLE

Structured Pruning for Efficient Generative Pre-trained Language Models

Chaofan Tao Lu Hou Haoli Bai Jiansheng Wei Xin Jiang Qun Liu Ping Luo Ngai Wong

Year: 2023 Pages: 10880-10895

DOI: 10.18653/v1/2023.findings-acl.692

Get Full-Text PDF Get Analytical Report

Abstract

The increasing sizes of large generative Pre-trained Language Models (PLMs) hinder their deploymentin real-world applications. To obtain efficient PLMs, previous studies mostly focus on pruning the attention heads and feed-forward networks (FFNs) of the Transformer. Nevertheless, we find that in generative PLMs, the hidden dimension shared by many other modules (e.g., embedding layer and layer normalization) contains persistent outliers regardless of the network input. This study comprehensively investigates the structured pruning of generative PLMs with all the above compressible components. To identify redundant network structures, we assign learnable masks over compressible components followed by sparse training. Various sizes of PLMs can be flexibly extracted via different thresholds, and are then task-specifically fine-tuned for further improvement. Extensive experiments on language modeling, summarization and machine translation validate the effectiveness of the proposed method. For example, the pruned BART brings 1.51x/6.96x inference speedup on GPU/CPU with 67% size reduction, and can be further combined with quantization for more than 25× compression.

Keywords:

Computer science Speedup Language model Generative grammar Normalization (sociology) Inference Artificial intelligence Transformer Machine learning Automatic summarization Quantization (signal processing) Pattern recognition (psychology) Algorithm Parallel computing

Metrics

Cited By

2.30

FWCI (Field Weighted Citation Impact)

Refs

0.87

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Structured Pruning for Efficient Generative Pre-trained Language Models

Abstract

Metrics

Citation History

Topics

Related Documents

Learnable Sparsity Structured Pruning for Acoustic Pre-trained Models

Pruning Pre-trained Language Models Without Fine-Tuning

SparseLLM: Towards Global Pruning of Pre-trained Language Models

Leveraging Generative Pre-trained Models and Discriminative Pre-trained Language Models for Sentiment Analysis

TextPruner: A Model Pruning Toolkit for Pre-Trained Language Models