DISSERTATION

Scalable syntactic inductive biases for neural language models

Kuncoro, Adhiguna Surya

Year: 2022 University:   Oxford University Research Archive (ORA) (University of Oxford)   Publisher: University of Oxford

Abstract

Natural language has a sequential surface form, although its underlying structure has been argued to be hierarchical and tree-structured in nature, whereby smaller linguistic units like words are recursively composed to form larger ones, such as phrases and sentences. This thesis aims to answer the following open research questions: To what extent---if at all---can more explicit notions of hierarchical syntactic structures further improve the performance of neural models within NLP, even within the context of successful models like BERT that learn from large amounts of data? And where exactly would stronger notions of syntactic structures be beneficial in different types of language understanding tasks? To answer these questions, we explore two approaches for augmenting neural sequence models with an inductive bias that encourages a more explicit modelling of hierarchical syntactic structures. In the first approach, we use existing techniques that design tree-structured neural networks, where the ordering of the computational operations is determined by hierarchical syntax trees. We discover that this approach is indeed effective for designing better and more robust models at various challenging benchmarks of syntactic competence, although these benefits nevertheless come at the expense of scalability: In practice, such tree-structured models are much more challenging to scale to large datasets. Hence, in the second approach, we devise a novel knowledge distillation strategy for combining the best of both syntactic inductive biases and data scale. Our proposed approach is effective across different neural sequence modelling architectures and objective functions: By applying our approach on top of a left-to-right LSTM, we design a distilled syntax-aware (DSA) LSTM that achieves a new state of the art (as of mid-2019) and human-level performance at targeted syntactic evaluations. By applying our approach on top of a Transformer-based BERT masked language model that works well at scale, we outperform a strong BERT baseline on six structured prediction tasks---including those that are not explicitly syntactic in nature---in addition to the corpus of linguistic acceptability. Notably, our approach yields a new state of the art (as of mid-2020)---among models pre-trained on the original BERT dataset---on four structured prediction tasks: In-domain and out-of-domain phrase-structure parsing, dependency parsing, and semantic role labelling. Altogether, our findings and methods in this work: (i) provide an example of how existing linguistic theories (particularly concerning the syntax of language), annotations, and resources can be used both as diagnostic evaluation tools, and also as a source of prior knowledge for crafting inductive biases that can improve the performance of computational models of language; (ii) showcase the continued relevance and benefits of more explicit syntactic inductive biases, even within the context of scalable neural models like BERT that can derive their knowledge from large amounts of data; (iii) contribute to a better understanding of where exactly syntactic biases are most helpful in different types of NLP tasks; and (iv) motivate the broader question of how we can design models that integrate stronger syntactic biases---and yet can be easily scalable at the same time---as a promising (if relatively underexplored) direction of NLP research.

Keywords:
Syntax Language model Inductive bias Scalability Artificial neural network Sequence (biology) Natural language Context (archaeology)

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Machine Learning in Healthcare
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Syntactic Inductive Biases for Natural Language Processing

Swayamdipta, Swabha

Journal:   KiltHub Repository Year: 2022
JOURNAL ARTICLE

Syntactic Inductive Biases for Natural Language Processing

Swayamdipta, Swabha

Journal:   OPAL (Open@LaTrobe) (La Trobe University) Year: 2022
JOURNAL ARTICLE

Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale

Laurent SartranSamuel BarrettAdhiguna KuncoroMiloš StanojevićPhil BlunsomChris Dyer

Journal:   Transactions of the Association for Computational Linguistics Year: 2022 Vol: 10 Pages: 1423-1439
DISSERTATION

Syntactic inductive biases for deep learning methods

Shen, Yikang

University:   Papyrus : Institutional Repository (Université de Montréal) Year: 2022
© 2026 ScienceGate Book Chapters — All rights reserved.