Scalable syntactic inductive biases for neural language models

Kuncoro, Adhiguna Surya

doi:10.5287/ora-j0m7zq7g5

ScienceGate Book Chapters

DISSERTATION

Scalable syntactic inductive biases for neural language models

Kuncoro, Adhiguna Surya

Year: 2022 University: Oxford University Research Archive (ORA) (University of Oxford) Publisher: University of Oxford

DOI: 10.5287/ora-j0m7zq7g5

Get Full-Text PDF Get Analytical Report

Abstract

Natural language has a sequential surface form, although its underlying structure has been argued to be hierarchical and tree-structured in nature, whereby smaller linguistic units like words are recursively composed to form larger ones, such as phrases and sentences. This thesis aims to answer the following open research questions: To what extent---if at all---can more explicit notions of hierarchical syntactic structures further improve the performance of neural models within NLP, even within the context of successful models like BERT that learn from large amounts of data? And where exactly would stronger notions of syntactic structures be beneficial in different types of language understanding tasks? To answer these questions, we explore two approaches for augmenting neural sequence models with an inductive bias that encourages a more explicit modelling of hierarchical syntactic structures. In the first approach, we use existing techniques that design tree-structured neural networks, where the ordering of the computational operations is determined by hierarchical syntax trees. We discover that this approach is indeed effective for designing better and more robust models at various challenging benchmarks of syntactic competence, although these benefits nevertheless come at the expense of scalability: In practice, such tree-structured models are much more challenging to scale to large datasets. Hence, in the second approach, we devise a novel knowledge distillation strategy for combining the best of both syntactic inductive biases and data scale. Our proposed approach is effective across different neural sequence modelling architectures and objective functions: By applying our approach on top of a left-to-right LSTM, we design a distilled syntax-aware (DSA) LSTM that achieves a new state of the art (as of mid-2019) and human-level performance at targeted syntactic evaluations. By applying our approach on top of a Transformer-based BERT masked language model that works well at scale, we outperform a strong BERT baseline on six structured prediction tasks---including those that are not explicitly syntactic in nature---in addition to the corpus of linguistic acceptability. Notably, our approach yields a new state of the art (as of mid-2020)---among models pre-trained on the original BERT dataset---on four structured prediction tasks: In-domain and out-of-domain phrase-structure parsing, dependency parsing, and semantic role labelling. Altogether, our findings and methods in this work: (i) provide an example of how existing linguistic theories (particularly concerning the syntax of language), annotations, and resources can be used both as diagnostic evaluation tools, and also as a source of prior knowledge for crafting inductive biases that can improve the performance of computational models of language; (ii) showcase the continued relevance and benefits of more explicit syntactic inductive biases, even within the context of scalable neural models like BERT that can derive their knowledge from large amounts of data; (iii) contribute to a better understanding of where exactly syntactic biases are most helpful in different types of NLP tasks; and (iv) motivate the broader question of how we can design models that integrate stronger syntactic biases---and yet can be easily scalable at the same time---as a promising (if relatively underexplored) direction of NLP research.

Keywords:

Syntax Language model Inductive bias Scalability Artificial neural network Sequence (biology) Natural language Context (archaeology)

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Machine Learning in Healthcare

Physical Sciences → Computer Science → Artificial Intelligence

Scalable syntactic inductive biases for neural language models

Abstract

Metrics

Topics

Related Documents

Syntactic Inductive Biases for Natural Language Processing

Syntactic Inductive Biases for Natural Language Processing

Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale

GCG-Based Artificial Languages for Evaluating Inductive Biases of Neural Language Models

Syntactic inductive biases for deep learning methods