JOURNAL ARTICLE

Elastic Sparse Attention for Long-Sequence Modeling

Wu, Hecong

Year: 2025 Journal:   Zenodo (CERN European Organization for Nuclear Research)   Publisher: European Organization for Nuclear Research

Abstract

The quadratic complexity of the standard attention mechanism in Transformers remains a primary bottleneck for processing long sequences. While sparse attention methods offer a promising solution, they often rely on fixed, static patterns or complex, learned mechanisms that may not be optimal across all layers of a deep network. We introduce **Elastic Sparse Attention (ESA)**, a novel sparse attention mechanism where the attention pattern deterministically and smoothly adapts based on layer depth. Early layers in the network employ a dense, local attention pattern to capture fine-grained local context, while deeper layers transition to a more dilated, long-range pattern to integrate global information. This layer-adaptive strategy is designed to create a comprehensive receptive field by the final layer, mitigating the risk of "attention holes." We present the algorithm, an optimized Triton kernel implementation, a method for visualizing the patterns, and a rigorous validation script that confirms full receptive field coverage for sequences up to 131,072 tokens. Code is available at https://github.com/HighCWu/elastic-sparse-attention .

Keywords:
Bottleneck Field (mathematics) Kernel (algebra) Mechanism (biology) Quadratic equation Attention network Pattern recognition (psychology) Code (set theory)

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.41
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Ultrasound and Cavitation Phenomena
Physical Sciences →  Materials Science →  Materials Chemistry
High-pressure geophysics and materials
Physical Sciences →  Earth and Planetary Sciences →  Geophysics
Cold Fusion and Nuclear Reactions
Physical Sciences →  Earth and Planetary Sciences →  Geochemistry and Petrology

Related Documents

JOURNAL ARTICLE

Elastic Sparse Attention for Long-Sequence Modeling

Wu, Hecong

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2025
JOURNAL ARTICLE

Long-range Sequence Modeling with Predictable Sparse Attention

Yimeng ZhuangJing ZhangMei Tu

Journal:   Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Year: 2022 Pages: 234-243
JOURNAL ARTICLE

Iterative Sparse Attention for Long-sequence Recommendation

Guan‐Yu LinJinwei LuoYinfeng LiGao ChenQiang LuoDepeng Jin

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2025 Vol: 39 (11)Pages: 12147-12155
© 2026 ScienceGate Book Chapters — All rights reserved.