Long Sequence Time-Series Forecasting (LSTF) is an important and challenging research with broad applications. Recent studies have shown that Transformer-based models can be effective in solving correlation problems in time-series data, but they also introduce issues of quadratic time and memory complexity, which make them unsuitable for LSTF problems. As a response, we investigate the impact of the long-tail distribution of attention scores on prediction accuracy and propose a Bis-Attention mechanism based on the mean measurement to bi-directionally sparse the self-attention matrix as a way to enhance the differentiation of attention scores and to reduce the complexity of the Transformer-based models from $O(L^{2})$ to $O((logL)^{2})$ . Moreover, we reduce memory consumption and optimize the model architecture through the use of a shared-QK method. The effectiveness of the proposed method is verified by theoretical analysis and visualisation. Extensive experiments on three benchmarks demonstrate that our method achieves better performance compared to other state-of-the-art methods, including an average reduction of 19.2% in MSE and 12% in MAE compared to Informer.
Yang LiChunhua TianYuyan LanChentao YuKeqiang Xie
Xiangxu MengWei LiTarek GaberZheng ZhaoChuhao Chen
R. Mohammdi FarsaniEhsan Pazouki
Qingbo ZhuJialin HanSheng YangZhiqiang XieBo TianHsien-Ming WanKai Chai
Tianlong ZhaoLexin FangXiang MaXuemei LiCaiming Zhang