JOURNAL ARTICLE

Long-range Sequence Modeling with Predictable Sparse Attention

Yimeng ZhuangJing ZhangMei Tu

Year: 2022 Journal:   Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Pages: 234-243

Abstract

Self-attention mechanism has been shown to be an effective approach for capturing global context dependencies in sequence modeling, but it suffers from quadratic complexity in time and memory usage. Due to the sparsity of the attention matrix, much computation is redundant. Therefore, in this paper, we design an efficient Transformer architecture, named Fourier Sparse Attention for Transformer (FSAT), for fast long-range sequence modeling. We provide a brand-new perspective for constructing sparse attention matrix, i.e. making the sparse attention matrix predictable. Two core sub-modules are: (1) A fast Fourier transform based hidden state cross module, which captures and pools L2 semantic combinations in 𝒪(Llog L) time complexity. (2) A sparse attention matrix estimation module, which predicts dominant elements of an attention matrix based on the output of the previous hidden state cross module. By reparameterization and gradient truncation, FSAT successfully learned the index of dominant elements. The overall complexity about the sequence length is reduced from 𝒪(L2) to 𝒪(Llog L). Extensive experiments (natural language, vision, and math) show that FSAT remarkably outperforms the standard multi-head attention and its variants in various long-sequence tasks with low computational costs, and achieves new state-of-the-art results on the Long Range Arena benchmark.

Keywords:
Computer science Computational complexity theory Sequence (biology) Algorithm Sparse matrix Transformer Quadratic equation Matrix (chemical analysis) Benchmark (surveying) Matrix decomposition Theoretical computer science Artificial intelligence Mathematics

Metrics

7
Cited By
0.82
FWCI (Field Weighted Citation Impact)
36
Refs
0.71
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Graph Neural Networks
Physical Sciences →  Computer Science →  Artificial Intelligence
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Elastic Sparse Attention for Long-Sequence Modeling

Wu, Hecong

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2025
JOURNAL ARTICLE

Elastic Sparse Attention for Long-Sequence Modeling

Wu, Hecong

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2025
JOURNAL ARTICLE

Iterative Sparse Attention for Long-sequence Recommendation

Guan‐Yu LinJinwei LuoYinfeng LiGao ChenQiang LuoDepeng Jin

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2025 Vol: 39 (11)Pages: 12147-12155
© 2026 ScienceGate Book Chapters — All rights reserved.