Abstract

We consider zero-shot cross-lingual transfer in legal topic classification using the recent Multi-EURLEX dataset. Since the original dataset contains parallel documents, which is unrealistic for zero-shot cross-lingual transfer, we develop a new version of the dataset without parallel documents. We use it to show that translation-based methods vastly outperform cross-lingual fine-tuning of multilingually pre-trained models, the best previous zero-shot transfer method for Multi-EURLEX. We also develop a bilingual teacher-student zero-shot transfer approach, which exploits additional unlabeled documents of the target language and performs better than a model fine-tuned directly on labeled target language documents.

Keywords:
Zero (linguistics) Computer science Artificial intelligence Shot (pellet) Transfer (computing) Natural language processing Transfer of learning Exploit Translation (biology) Pattern recognition (psychology) Linguistics

Metrics

7
Cited By
1.37
FWCI (Field Weighted Citation Impact)
14
Refs
0.79
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Artificial Intelligence in Law
Social Sciences →  Social Sciences →  Political Science and International Relations
© 2026 ScienceGate Book Chapters — All rights reserved.