Contributions of Transformer Attention Heads in Multi- and Cross-lingual Tasks

Weicheng Ma; Kai Zhang; Renze Lou; Lili Wang; Soroush Vosoughi

doi:10.18653/v1/2021.acl-long.152

ScienceGate Book Chapters

JOURNAL ARTICLE

Contributions of Transformer Attention Heads in Multi- and Cross-lingual Tasks

Weicheng Ma Kai Zhang Renze Lou Lili Wang Soroush Vosoughi

Year: 2021 Pages: 1956-1966

DOI: 10.18653/v1/2021.acl-long.152

Get Full-Text PDF Get Analytical Report

Abstract

This paper studies the relative importance of attention heads in Transformer-based models to aid their interpretability in cross-lingual and multi-lingual tasks. Prior research has found that only a few attention heads are important in each mono-lingual Natural Language Processing (NLP) task and pruning the remaining heads leads to comparable or improved performance of the model. However, the impact of pruning attention heads is not yet clear in cross-lingual and multi-lingual tasks. Through extensive experiments, we show that (1) pruning a number of attention heads in a multi-lingual Transformer-based model has, in general, positive effects on its performance in cross-lingual and multi-lingual tasks and (2) the attention heads to be pruned can be ranked using gradients and identified with a few trial experiments. Our experiments focus on sequence labeling tasks, with potential applicability on other cross-lingual and multi-lingual tasks. For comprehensiveness, we examine two pre-trained multi-lingual models, namely multi-lingual BERT (mBERT) and XLM-R, on three tasks across 9 languages each. We also discuss the validity of our findings and their extensibility to truly resource-scarce languages and other task settings.

Keywords:

Metrics

Cited By

0.56

FWCI (Field Weighted Citation Impact)

Refs

0.72

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Explainable Artificial Intelligence (XAI)

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Contributions of Transformer Attention Heads in Multi- and Cross-lingual Tasks

Abstract

Metrics

Citation History

Topics

Related Documents

CAR-Transformer: Cross-Attention Reinforcement Transformer for Cross-Lingual Summarization

Embedded Heterogeneous Attention Transformer for Cross-Lingual Image Captioning

Cross- & multi-lingual medication detection: a transformer-based analysis

GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and Event Extraction

Cross-lingual AMR Aligner: Paying Attention to Cross-Attention