Transformer Language Models without Positional Encodings Still Learn Positional Information

Adi Haviv; Ori Ram; Ofir Press; Peter Izsak; Omer Levy

doi:10.18653/v1/2022.findings-emnlp.99

ScienceGate Book Chapters

JOURNAL ARTICLE

Transformer Language Models without Positional Encodings Still Learn Positional Information

Adi Haviv Ori Ram Ofir Press Peter Izsak Omer Levy

Year: 2022

DOI: 10.18653/v1/2022.findings-emnlp.99

Get Full-Text PDF Get Analytical Report

Abstract

Causal transformer language models (LMs), such as GPT-3, typically require some form of positional encoding, such as positional embeddings. However, we show that LMs without any explicit positional encoding are still competitive with standard models and that this phenomenon is robust across different datasets, model sizes, and sequence lengths.Probing experiments reveal that such models acquire an implicit notion of absolute positions throughout the network, effectively compensating for the missing information.We conjecture that causal attention enables the model to infer the number of predecessors that each token can attend to, thereby approximating its absolute position.Our findings indicate that causal LMs might derive positional awareness not only from the explicit positioning mechanism but also from the effects of the causal mask.

Keywords:

Computer science Transformer Security token Language model Encoding (memory) Question answering Artificial intelligence Position (finance)

Metrics

Cited By

8.81

FWCI (Field Weighted Citation Impact)

Refs

0.97

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Explainable Artificial Intelligence (XAI)

Physical Sciences → Computer Science → Artificial Intelligence

Machine Learning in Healthcare

Physical Sciences → Computer Science → Artificial Intelligence

Transformer Language Models without Positional Encodings Still Learn Positional Information

Abstract

Metrics

Citation History

Topics

Related Documents

Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings

Complex-Valued Relative Positional Encodings for Transformer

ExPe: Exact Positional Encodings for Generative Transformer Models with Extrapolating Capabilities

Algebraic Positional Encodings

Positional language models for information retrieval