POSITIONAL ENCODING FOR TRANSFORMERS

Kateryna Antipova; H.V. Horban

doi:10.30525/978-9934-26-436-8-1

ScienceGate Book Chapters

BOOK-CHAPTER

POSITIONAL ENCODING FOR TRANSFORMERS

Kateryna Antipova H.V. Horban

Year: 2024

DOI: 10.30525/978-9934-26-436-8-1

Get Full-Text PDF Get Analytical Report

Abstract

The attention mechanism is a powerful and effective method utilized in natural language processing. This mechanism allows the model to focus on important parts of the input sequence. Transformer model utilizes attention mechanisms to replace recurrent and convolutional neural networks, which eliminates the need for increasingly complex operations as the distance between words in a sequence increases. However, this method is notably insensitive to positional information. Positional encoding is crucial for Transformer-like models that heavily rely on the attention mechanism. To make the models position-aware, the position information of the input words is typically incorporated to the input token embeddings as an additional embedding. The purpose of the paper is to conduct a systematic study to understand different position encoding methods. We briefly describe the components of the attention mechanism, its role in the Transformer model, and the encoder-decoder architecture of the Transformer. We also study how sharing position encodings across various heads and layers of a Transformer affects the model performance. Methodology of the study is based on general research methods of analysis and synthesis, experimental testing, and quantitative analysis to comprehensively examine and compare the efficacy and performance of different positional encoding techniques utilized in Transformer models. The obtained results show that using absolute and relative encodings results in similar performance for the model, while relative encodings worked much better with longer sentences. We found the original encoder-decoder form worked best for the tasks of machine translation and question answering. Despite using twice as many parameters as "encoder-only" or "decoder-only" architectures, an encoder-decoder model has a similar computational cost. Besides that, the number of learnable parameters can often be reduced without performance loss. Practical implications.Positional encoding is essential for enabling Transformer models to effectively process data by preserving sequence order, handling variable-length sequences, and improving generalization. Its inclusion significantly contributes to the success of Transformer-based architectures in various natural language processing tasks. Value/originality.Positional encoding is such a critical issue for Transformer-like models. However, it has not been explored how positional encoding establishes positional dependencies within a sequence. We chose to analyze several approaches to position encoding in the context of question answering and machine translation tasks because the influence of positional encoding on NLP models in terms of word order remains ambiguous and requires further exploration.

Keywords:

Computer science Transformer Encoder Security token Embedding Artificial intelligence Encoding (memory) Pattern recognition (psychology) Engineering Voltage

Metrics

Cited By

2.52

FWCI (Field Weighted Citation Impact)

Refs

0.84

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

POSITIONAL ENCODING FOR TRANSFORMERS

Abstract

Metrics

Citation History

Topics

Related Documents

Applying Positional Encoding to Enhance Vision-Language Transformers

A Simple and Effective Positional Encoding for Transformers

Study of Positional Encoding Approaches for Audio Spectrogram Transformers

A super-pixel slicing enhanced positional encoding for vision transformers

UWB TDoA Error Correction Using Transformers: Patching and Positional Encoding Strategies