JOURNAL ARTICLE

Contributions of Transformer Attention Heads in Multi- and Cross-lingual Tasks

Abstract

This paper studies the relative importance of attention heads in Transformer-based models to aid their interpretability in cross-lingual and multi-lingual tasks. Prior research has found that only a few attention heads are important in each mono-lingual Natural Language Processing (NLP) task and pruning the remaining heads leads to comparable or improved performance of the model. However, the impact of pruning attention heads is not yet clear in cross-lingual and multi-lingual tasks. Through extensive experiments, we show that (1) pruning a number of attention heads in a multi-lingual Transformer-based model has, in general, positive effects on its performance in cross-lingual and multi-lingual tasks and (2) the attention heads to be pruned can be ranked using gradients and identified with a few trial experiments. Our experiments focus on sequence labeling tasks, with potential applicability on other cross-lingual and multi-lingual tasks. For comprehensiveness, we examine two pre-trained multi-lingual models, namely multi-lingual BERT (mBERT) and XLM-R, on three tasks across 9 languages each. We also discuss the validity of our findings and their extensibility to truly resource-scarce languages and other task settings.

Keywords:

Metrics

6
Cited By
0.56
FWCI (Field Weighted Citation Impact)
0
Refs
0.72
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Explainable Artificial Intelligence (XAI)
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

CAR-Transformer: Cross-Attention Reinforcement Transformer for Cross-Lingual Summarization

Yuang CaiYuyu Yuan

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2024 Vol: 38 (16)Pages: 17718-17726
JOURNAL ARTICLE

Embedded Heterogeneous Attention Transformer for Cross-Lingual Image Captioning

Zijie SongZhenzhen HuYuanen ZhouYe ZhaoRichang HongMeng Wang

Journal:   IEEE Transactions on Multimedia Year: 2024 Vol: 26 Pages: 9008-9020
JOURNAL ARTICLE

GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and Event Extraction

Wasi Uddin AhmadNanyun PengKai-Wei Chang

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2021 Vol: 35 (14)Pages: 12462-12470
© 2026 ScienceGate Book Chapters — All rights reserved.