WinStat: A Family of Trainable Positional Encodings for Transformers in Time Series Forecasting

Cristhian Moya-Mota; Ignacio Aguilera-Martos; Diego García‐Gil; Julián Luengo

doi:10.3390/make8010007

ScienceGate Book Chapters

JOURNAL ARTICLE

WinStat: A Family of Trainable Positional Encodings for Transformers in Time Series Forecasting

Cristhian Moya-Mota Ignacio Aguilera-Martos Diego García‐Gil Julián Luengo

Year: 2025 Journal: Machine Learning and Knowledge Extraction Vol: 8 (1)Pages: 7-7 Publisher: Multidisciplinary Digital Publishing Institute

DOI: 10.3390/make8010007

Get Full-Text PDF Get Analytical Report

Abstract

Transformers for time series forecasting rely on positional encoding to inject temporal order into the permutation-invariant self-attention mechanism. Classical sinusoidal absolute encodings are fixed and purely geometric; learnable absolute encodings often overfit and fail to extrapolate, while relative or advanced schemes can impose substantial computational overhead without being sufficiently tailored to temporal data. This work introduces a family of window-statistics positional encodings that explicitly incorporate local temporal semantics into the representation of each timestamp. The base variant (WinStat) augments inputs with statistics computed over a sliding window; WinStatLag adds explicit lag-difference features; and hybrid variants (WinStatFlex, WinStatTPE, WinStatSPE) learn soft mixtures of window statistics with absolute, learnable, and semantic positional signals, preserving the simplicity of additive encodings while adapting to local structure and informative lags. We evaluate proposed encodings on four heterogeneous benchmarks against state-of-the-art proposals: Electricity Transformer Temperature (hourly variants), Individual Household Electric Power Consumption, New York City Yellow Taxi Trip Records, and a large-scale industrial time series from heavy machinery. All experiments use a controlled Transformer backbone with full self-attention to isolate the effect of positional information. Across datasets, the proposed methods consistently reduce mean squared error and mean absolute error relative to a strong Transformer baseline with sinusoidal positional encoding and state-of-the-art encodings for time series, with WinStatFlex and WinStatTPE emerging as the most effective variants. Ablation studies that randomly shuffle decoder inputs markedly degrade the proposed methods, supporting the conclusion that their gains arise from learned order-aware locality and semantic structure rather than incidental artifacts. A simple and reproducible heuristic for setting the sliding-window length—roughly one quarter to one third of the input sequence length—provides robust performance without the need for exhaustive tuning.

Keywords:

Transformer Encoding (memory) Series (stratigraphy) Computation Heuristic Pattern recognition (psychology) Overfitting Time series

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.77

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Time Series Analysis and Forecasting

Physical Sciences → Computer Science → Signal Processing

Traffic Prediction and Management Techniques

Physical Sciences → Engineering → Building and Construction

Machine Learning in Healthcare

Physical Sciences → Computer Science → Artificial Intelligence

WinStat: A Family of Trainable Positional Encodings for Transformers in Time Series Forecasting

Abstract

Metrics

Topics

Related Documents

Time Series Forecasting with Transformers

Transformers in Time-Series Forecasting

Randomized Positional Encodings Boost Length Generalization of Transformers

Are Transformers Effective for Time Series Forecasting?

MLP-Transformers: Multi-Layer Perceptron coordinated Transformers for Time Series Forecasting