This paper presents batched iterative solvers for GPU architectures. We elaborate on the design of the batched functionality aiming for optimal performance while still giving the user some flexibility in terms of choosing a sparse matrix format, a preconditioner optimized for the distinct items of the batch, and an application-specific stopping criterion that is evaluated for each problem in the batch, individually. Performance results for benchmark problems coming from PeleLM simulations reveal the potential of the batched iterative solvers for computational chemistry simulations, and their advantage compared to the current vendor-provided batched solutions.
Aditya KashiPratik NayakDhruva KulkarniAaron ScheinbergPaul LinHartwig Anzt
Isha AggarwalPratik NayakAditya KashiHartwig Anzt
Aditya KashiPratik NayakDhruva KulkarniAaron ScheinbergPaul LinHartwig Anzt
Mykola LukashKarl RuppS. Selberherr