JOURNAL ARTICLE

Learning Syntax Without Planting Trees: Understanding Hierarchical Generalization in Transformers

Kabir AhujaVidhisha BalachandranMadhur PanwarTianxing HeNoah A. SmithNavin GoyalYulia Tsvetkov

Year: 2025 Journal:   Transactions of the Association for Computational Linguistics Vol: 13 Pages: 121-141   Publisher: Association for Computational Linguistics

Abstract

Abstract Transformers trained on natural language data have been shown to exhibit hierarchical generalization without explicitly encoding any structural bias. In this work, we investigate sources of inductive bias in transformer models and their training that could cause such preference for hierarchical generalization. We extensively experiment with transformers trained on five synthetic, controlled datasets using several training objectives and show that, while objectives such as sequence-to-sequence modeling, classification, etc., often fail to lead to hierarchical generalization, the language modeling objective consistently leads to transformers generalizing hierarchically. We then study how different generalization behaviors emerge during the training by conducting pruning experiments that reveal the joint existence of subnetworks within the model implementing different generalizations. Finally, we take a Bayesian perspective to understand transformers’ preference for hierarchical generalization: We establish a correlation between whether transformers generalize hierarchically on a dataset and if the simplest explanation of that dataset is provided by a hierarchical grammar compared to regular grammars exhibiting linear generalization. Overall, our work presents new insights on the origins of hierarchical generalization in transformers and provides a theoretical framework for studying generalization in language models.

Keywords:
Computer science Transformer Generalization Artificial intelligence Natural language processing Syntax Machine learning Mathematics

Metrics

4
Cited By
19.28
FWCI (Field Weighted Citation Impact)
34
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Language and cultural evolution
Social Sciences →  Social Sciences →  Cultural Studies
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.