Realistic Zero-Shot Cross-Lingual Transfer in Legal Topic Classification

Stratos Xenouleas; Alexia Tsoukara; Giannis Panagiotakis; Ilias Chalkidis; Ion Androutsopoulos

doi:10.1145/3549737.3549760

ScienceGate Book Chapters

JOURNAL ARTICLE

Realistic Zero-Shot Cross-Lingual Transfer in Legal Topic Classification

Stratos Xenouleas Alexia Tsoukara Giannis Panagiotakis Ilias Chalkidis Ion Androutsopoulos

Year: 2022 Pages: 1-8

DOI: 10.1145/3549737.3549760

Get Full-Text PDF Get Analytical Report

Abstract

We consider zero-shot cross-lingual transfer in legal topic classification using the recent Multi-EURLEX dataset. Since the original dataset contains parallel documents, which is unrealistic for zero-shot cross-lingual transfer, we develop a new version of the dataset without parallel documents. We use it to show that translation-based methods vastly outperform cross-lingual fine-tuning of multilingually pre-trained models, the best previous zero-shot transfer method for Multi-EURLEX. We also develop a bilingual teacher-student zero-shot transfer approach, which exploits additional unlabeled documents of the target language and performs better than a model fine-tuned directly on labeled target language documents.

Keywords:

Zero (linguistics) Computer science Artificial intelligence Shot (pellet) Transfer (computing) Natural language processing Transfer of learning Exploit Translation (biology) Pattern recognition (psychology) Linguistics

Metrics

Cited By

1.37

FWCI (Field Weighted Citation Impact)

Refs

0.79

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Artificial Intelligence in Law

Social Sciences → Social Sciences → Political Science and International Relations

Realistic Zero-Shot Cross-Lingual Transfer in Legal Topic Classification

Abstract

Metrics

Citation History

Topics

Related Documents

MultiEURLEX - A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer

Zero-Shot Cross-Lingual Transfer in Legal Domain Using Transformer models

Zero-Shot Cross-Lingual Transfer in Legal Domain Using Transformer Models

Toxicity classification in multiple languages using cross lingual zero shot transfer.

Generalization Measures for Zero-Shot Cross-Lingual Transfer