JOURNAL ARTICLE

Composable Sparse Fine-Tuning for Cross-Lingual Transfer

Alan AnsellEdoardo Maria PontiAnna KorhonenIvan Vulić

Year: 2022 Journal:   Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Pages: 1778-1796

Abstract

Fine-tuning the entire set of parameters of a large pretrained model has become the mainstream approach for transfer learning. To increase its efficiency and prevent catastrophic forgetting and interference, techniques like adapters and sparse fine-tuning have been developed. Adapters are modular, as they can be combined to adapt a model towards different facets of knowledge (e.g., dedicated language and/or task adapters). Sparse fine-tuning is expressive, as it controls the behavior of all model components. In this work, we introduce a new fine-tuning method with both these desirable properties. In particular, we learn sparse, real-valued masks based on a simple variant of the Lottery Ticket Hypothesis. Task-specific masks are obtained from annotated data in a source language, and language-specific masks from masked language modeling in a target language. Both these masks can then be composed with the pretrained model. Unlike adapter-based fine-tuning, this method neither increases the number of parameters at inference time nor alters the original model architecture. Most importantly, it outperforms adapters in zero-shot cross-lingual transfer by a large margin in a series of multilingual benchmarks, including Universal Dependencies, MasakhaNER, and AmericasNLI. Based on an in-depth analysis, we additionally find that sparsity is crucial to prevent both 1) interference between the fine-tunings to be composed and 2) overfitting. We release the code and models at https://github.com/cambridgeltl/composable-sft.

Keywords:
Computer science Language model Fine-tuning Modular design Overfitting Artificial intelligence Task (project management) Inference Adapter (computing) Programming language Computer hardware

Metrics

59
Cited By
6.94
FWCI (Field Weighted Citation Impact)
60
Refs
0.97
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

DISSERTATION

Efficient and Composable Adaptation for Cross-Lingual Transfer

Ansell, Alan

University:   Apollo (University of Cambridge) Year: 2024
JOURNAL ARTICLE

Effective Fine-Tuning Methods for Cross-lingual Adaptation

Tao YuShafiq Joty

Journal:   Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing Year: 2021 Pages: 8492-8501
JOURNAL ARTICLE

Cross-Lingual Cross-Modal Retrieval With Noise-Robust Fine-Tuning

Rui CaiJianfeng DongTianxiang LiangYonghui LiangYabing WangXun YangXun WangMeng Wang

Journal:   IEEE Transactions on Knowledge and Data Engineering Year: 2024 Vol: 36 (11)Pages: 5860-5873
© 2026 ScienceGate Book Chapters — All rights reserved.