Accurate multivariate time-series forecasting underpins applications in energy, transport, finance, and weather. Yet with many Transformer variants now available, it is hard to pick a model that balances accuracy with compute cost. We present a clear, like-for-like benchmark of twelve Transformer architectures for multivariate time-series forecasting across six public datasets—Electricity Load (ECL), Exchange Rate, Traffic, Weather, Solar Energy, and ETT (ETTh1)—under a single training and evaluation pipeline. Performance is reported using MAE, RMSE, MAPE, and NWRMSLE, alongside efficiency indicators (seconds per epoch, peak VRAM), averaged over five seeds with train-only normalization; paired t-tests assess statistical significance. Three robust patterns emerge: (i) there is no universal best model—outcomes depend on seasonality, dimensionality, non-stationarity, and forecast horizon; (ii) models that encode seasonal/frequency structure or use patch-based temporal tokens tend to lead on strongly seasonal data; and (iii) in high-dimensional settings, architectures with efficient or structured attention achieve competitive errors with more favorable time–memory trade-offs. Rankings remain stable across absolute, percentage, and log-space metrics, indicating conclusions are not driven by scale effects or rare spikes. To ensure reproducibility, we provide preprocessing steps, hyperparameters, training schedules, and evaluation scripts, and offer guidance for selecting Transformer forecasters under accuracy and resource constraints, with forward paths in probabilistic evaluation, irregular sampling, continual adaptation, and efficiency-oriented deployment.
A. Yarkın YldzEmirhan KoçAykut Koç