Load-balancing Sparse Matrix Vector Product Kernels on GPUs

Hartwig Anzt; Terry Cojean; Yen‐Chen Chen; Jack Dongarra; Goran Flegar; Pratik Nayak; Stanimire Tomov; Yu‐Hsiang Tsai; Weichung Wang

doi:10.1145/3380930

JOURNAL ARTICLE

Load-balancing Sparse Matrix Vector Product Kernels on GPUs

Hartwig Anzt Terry Cojean Yen‐Chen Chen Jack Dongarra Goran Flegar Pratik Nayak Stanimire Tomov Yu‐Hsiang Tsai Weichung Wang

Year: 2020 Journal: ACM Transactions on Parallel Computing Vol: 7 (1)Pages: 1-26 Publisher: Association for Computing Machinery

DOI: 10.1145/3380930

Get Full-Text PDF Get Analytical Report

Abstract

Efficient processing of Irregular Matrices on Single Instruction, Multiple Data (SIMD)-type architectures is a persistent challenge. Resolving it requires innovations in the development of data formats, computational techniques, and implementations that strike a balance between thread divergence, which is inherent for Irregular Matrices, and padding, which alleviates the performance-detrimental thread divergence but introduces artificial overheads. To this end, in this article, we address the challenge of designing high performance sparse matrix-vector product (S p MV) kernels designed for Nvidia Graphics Processing Units (GPUs). We present a compressed sparse row (CSR) format suitable for unbalanced matrices. We also provide a load-balancing kernel for the coordinate (COO) matrix format and extend it to a hybrid algorithm that stores part of the matrix in SIMD-friendly Ellpack format (ELL) format. The ratio between the ELL- and the COO-part is determined using a theoretical analysis of the nonzeros-per-row distribution. For the over 2,800 test matrices available in the Suite Sparse matrix collection, we compare the performance against S p MV kernels provided by NVIDIA’s cuSPARSE library and a heavily-tuned sliced ELL (SELL-P) kernel that prevents unnecessary padding by considering the irregular matrices as a combination of matrix blocks stored in ELL format.

Keywords:

Computer science Parallel computing SIMD Thread (computing) Sparse matrix CUDA Graphics Matrix (chemical analysis) Matrix multiplication Kernel (algebra) Padding Computational science Algorithm Computer graphics (images) Mathematics Discrete mathematics Programming language

Metrics

Cited By

6.78

FWCI (Field Weighted Citation Impact)

Refs

0.97

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Parallel Computing and Optimization Techniques

Physical Sciences → Computer Science → Hardware and Architecture

Algorithms and Data Compression

Physical Sciences → Computer Science → Artificial Intelligence

Embedded Systems Design Techniques

Physical Sciences → Computer Science → Hardware and Architecture

Load-balancing Sparse Matrix Vector Product Kernels on GPUs

Abstract

Metrics

Citation History

Topics

Related Documents

Efficient sparse-matrix multi-vector product on GPUs

Flexible batched sparse matrix-vector product on GPUs

Load-balancing in sparse matrix-vector multiplication

Sparse Matrix-Vector Product for the bmSparse Matrix Format in GPUs

A new approach for sparse matrix vector product on NVIDIA GPUs