Haozhong QiuChuanfu XuJianbin FangJian ZhangLiang DengZ. G. DaiYue DingYue WangZhiping HanYonggang CheJie Liu
Sparse iterative solvers are commonly used in various fields. However, certain essential kernels of these solvers, such as sparse triangular solves (SpTRSV), present significant challenges for efficient parallelization due to data dependencies . Previous methods, like level-scheduling or multi-coloring, typically involve creating a Task Dependency Graph (TDG) to represent data dependencies and identify independent sets from the TDG for parallel execution. However, these approaches often result in limited parallelism with substantial synchronization overheads or negatively impact the solver convergence rate. This article introduces DCSolver , a Divide-and-Conquer (DC) framework designed to efficiently parallelize sparse solvers with data dependencies on GPUs. To achieve this, we break down the solver TDG into independent subgraphs, allowing us to exploit both coarse-grained and fine-grained parallelism. To efficiently allocate GPU threads for subgraphs with varying degrees of parallelism, we have developed an adaptive in-warp scheduling strategy. Additionally, we propose a hybrid parallelization scheme in DCSolver, which involves employing different parallel approaches for different DC recursions to achieve a more optimal balance between parallelism and convergence for solvers. To evaluate the effectiveness of DCSolver, we apply it to two preconditioned Krylov subspace solvers and an unstructured mesh Computational Fluid Dynamics (CFD) solver. Our results show that when compared with the state-of-the-art methods, DCSolver accelerates the time-to-solution of solvers by an average speedup of up to 26.19X.
Mykola LukashKarl RuppS. Selberherr
Mark GatesStanimire TomovJack Dongarra
Paul LinPratik NayakAditya KashiDhruva KulkarniAaron ScheinbergHartwig Anzt