JOURNAL ARTICLE

Geographic Adaptation of Pretrained Language Models

Valentin HofmannGoran GlavaššNikola LjubešićJanet B. PierrehumbertHinrich Schütze

Year: 2024 Journal:   Transactions of the Association for Computational Linguistics Vol: 12 Pages: 411-431   Publisher: Association for Computational Linguistics

Abstract

Abstract While pretrained language models (PLMs) have been shown to possess a plethora of linguistic knowledge, the existing body of research has largely neglected extralinguistic knowledge, which is generally difficult to obtain by pretraining on text alone. Here, we contribute to closing this gap by examining geolinguistic knowledge, i.e., knowledge about geographic variation in language. We introduce geoadaptation, an intermediate training step that couples language modeling with geolocation prediction in a multi-task learning setup. We geoadapt four PLMs, covering language groups from three geographic areas, and evaluate them on five different tasks: fine-tuned (i.e., supervised) geolocation prediction, zero-shot (i.e., unsupervised) geolocation prediction, fine-tuned language identification, zero-shot language identification, and zero-shot prediction of dialect features. Geoadaptation is very successful at injecting geolinguistic knowledge into the PLMs: The geoadapted PLMs consistently outperform PLMs adapted using only language modeling (by especially wide margins on zero-shot prediction tasks), and we obtain new state-of-the-art results on two benchmarks for geolocation prediction and language identification. Furthermore, we show that the effectiveness of geoadaptation stems from its ability to geographically retrofit the representation space of the PLMs.

Keywords:
Geolocation Computer science Adaptation (eye) Artificial intelligence Bosnian Task (project management) Machine learning Language model Feature (linguistics) Natural language processing Linguistics

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
94
Refs
0.11
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Hermeneutics and Narrative Identity
Social Sciences →  Arts and Humanities →  Philosophy
Aging, Elder Care, and Social Issues
Health Sciences →  Health Professions →  General Health Professions
Health, Medicine and Society
Health Sciences →  Health Professions →  General Health Professions

Related Documents

JOURNAL ARTICLE

Use of Pretrained Language Models for Geographic Information Retrieval

Horde Vo, AlexisDuckham, MattHe, EstridRanamuka, NayomiBenli, Rafe

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2025
JOURNAL ARTICLE

Efficient Hierarchical Domain Adaptation for Pretrained Language Models

Alexandra ChronopoulouMatthew E. PetersJesse Dodge

Journal:   Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Year: 2022 Pages: 1336-1351
BOOK-CHAPTER

Pretrained language models

Chenguang Zhu

Elsevier eBooks Year: 2021 Pages: 113-133
JOURNAL ARTICLE

VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

Jun ChenHan GuoKai YiBoyang LiMohamed Elhoseiny

Journal:   2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Year: 2022 Pages: 18009-18019
© 2026 ScienceGate Book Chapters — All rights reserved.