Elastic Sparse Attention for Long-Sequence Modeling

Wu, Hecong

doi:10.5281/zenodo.15871899

ScienceGate Book Chapters

JOURNAL ARTICLE

Elastic Sparse Attention for Long-Sequence Modeling

Wu, Hecong

Year: 2025 Journal: Zenodo (CERN European Organization for Nuclear Research) Publisher: European Organization for Nuclear Research

DOI: 10.5281/zenodo.15871899

Get Full-Text PDF Get Analytical Report

Abstract

The quadratic complexity of the standard attention mechanism in Transformers remains a primary bottleneck for processing long sequences. While sparse attention methods offer a promising solution, they often rely on fixed, static patterns or complex, learned mechanisms that may not be optimal across all layers of a deep network. We introduce **Elastic Sparse Attention (ESA)**, a novel sparse attention mechanism where the attention pattern deterministically and smoothly adapts based on layer depth. Early layers in the network employ a dense, local attention pattern to capture fine-grained local context, while deeper layers transition to a more dilated, long-range pattern to integrate global information. This layer-adaptive strategy is designed to create a comprehensive receptive field by the final layer, mitigating the risk of "attention holes." We present the algorithm, an optimized Triton kernel implementation, a method for visualizing the patterns, and a rigorous validation script that confirms full receptive field coverage for sequences up to 131,072 tokens. Code is available at https://github.com/HighCWu/elastic-sparse-attention .

Keywords:

Bottleneck Field (mathematics) Kernel (algebra) Mechanism (biology) Quadratic equation Attention network Pattern recognition (psychology) Code (set theory)

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.41

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Ultrasound and Cavitation Phenomena

Physical Sciences → Materials Science → Materials Chemistry

High-pressure geophysics and materials

Physical Sciences → Earth and Planetary Sciences → Geophysics

Cold Fusion and Nuclear Reactions

Physical Sciences → Earth and Planetary Sciences → Geochemistry and Petrology

Elastic Sparse Attention for Long-Sequence Modeling

Abstract

Metrics

Topics

Related Documents

Elastic Sparse Attention for Long-Sequence Modeling

Long-range Sequence Modeling with Predictable Sparse Attention

Iterative Sparse Attention for Long-sequence Recommendation

Adaptive Attention for Sparse-based Long-sequence Transformer

LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention