Haotian TangZhijian LiuXiuyu LiYujun LinHan, Song
This artifact contains the source code of the MLSys 2022 paper TorchSparse: Efficient Point Cloud Inference Engine. It contains the implementation of the TorchSparse inference engine and the artifact evaluation script. The goal of this artifact is to help readers to reproduce our paper results and build new research on top of our work. Paper abstract: Deep learning on point clouds has received increased attention thanks to its wide applications in AR/VR and autonomous driving. These applications require low latency and high accuracy to provide real-time user experience and ensure user safety. Unlike conventional dense workloads, the sparse and irregular nature of point clouds poses severe challenges to running sparse CNNs efficiently on the general-purpose hardware, and existing sparse acceleration techniques for 2D images do not translate to 3D point clouds. In this paper, we introduce TorchSparse, a high-performance point cloud inference engine that accelerates the sparse convolution computation on GPUs. TorchSparse directly optimizes the two bottlenecks of sparse convolution: data movement and irregular computation. It optimizes the data orchestration by quantization and fused locality-aware memory access, reducing the memory movement cost by 2.7x. It also adopts adaptive MM grouping to trade computation for better regularity, achieving 1.4-1.5x speedup for matrix multiplication. Evaluated on seven representative models across three benchmark datasets, TorchSparse achieves 1.6x and 1.5x measured end-to-end speedup over the state-of-the-art MinkowskiEngine and SpConv, respectively.
Haotian TangShang Fa YangZhijian LiuKe HongZhongming YuXiuyu LiGuohao DaiYu WangSong Han
Shanzhi LinXinrui ZhuBaohui XieTinghuan ChenCheng ZhuoQi SunBei Yu
Haotian TangYang, ShangZhijian LiuHong, KeZhongming YuXiuyu LiGuohao DaiWang, YuHan, Song
Jiawei ShaoHaowei ZhangYuyi MaoJun Zhang