Abstract

The attention mechanism is a powerful and effective method utilized in natural language processing. This mechanism allows the model to focus on important parts of the input sequence. Transformer model utilizes attention mechanisms to replace recurrent and convolutional neural networks, which eliminates the need for increasingly complex operations as the distance between words in a sequence increases. However, this method is notably insensitive to positional information. Positional encoding is crucial for Transformer-like models that heavily rely on the attention mechanism. To make the models position-aware, the position information of the input words is typically incorporated to the input token embeddings as an additional embedding. The purpose of the paper is to conduct a systematic study to understand different position encoding methods. We briefly describe the components of the attention mechanism, its role in the Transformer model, and the encoder-decoder architecture of the Transformer. We also study how sharing position encodings across various heads and layers of a Transformer affects the model performance. Methodology of the study is based on general research methods of analysis and synthesis, experimental testing, and quantitative analysis to comprehensively examine and compare the efficacy and performance of different positional encoding techniques utilized in Transformer models. The obtained results show that using absolute and relative encodings results in similar performance for the model, while relative encodings worked much better with longer sentences. We found the original encoder-decoder form worked best for the tasks of machine translation and question answering. Despite using twice as many parameters as "encoder-only" or "decoder-only" architectures, an encoder-decoder model has a similar computational cost. Besides that, the number of learnable parameters can often be reduced without performance loss. Practical implications.Positional encoding is essential for enabling Transformer models to effectively process data by preserving sequence order, handling variable-length sequences, and improving generalization. Its inclusion significantly contributes to the success of Transformer-based architectures in various natural language processing tasks. Value/originality.Positional encoding is such a critical issue for Transformer-like models. However, it has not been explored how positional encoding establishes positional dependencies within a sequence. We chose to analyze several approaches to position encoding in the context of question answering and machine translation tasks because the influence of positional encoding on NLP models in terms of word order remains ambiguous and requires further exploration.

Keywords:
Computer science Transformer Encoder Security token Embedding Artificial intelligence Encoding (memory) Pattern recognition (psychology) Engineering Voltage

Metrics

2
Cited By
2.52
FWCI (Field Weighted Citation Impact)
0
Refs
0.84
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

A Simple and Effective Positional Encoding for Transformers

Pu-Chin ChenHenry TsaiSrinadh BhojanapalliHyung Won ChungYin-Wen ChangChun-Sung Ferng

Journal:   Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing Year: 2021 Pages: 2974-2988
JOURNAL ARTICLE

Study of Positional Encoding Approaches for Audio Spectrogram Transformers

Leonardo PepinoPablo RieraLuciana Ferrer

Journal:   ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Year: 2022 Pages: 3713-3717
JOURNAL ARTICLE

UWB TDoA Error Correction Using Transformers: Patching and Positional Encoding Strategies

Dieter CoppensAdnan ShahidEli De Poorter

Journal:   IEEE Transactions on Wireless Communications Year: 2025 Vol: 25 Pages: 7000-7013
© 2026 ScienceGate Book Chapters — All rights reserved.