Recently, there has been a growing interest in image/video coding tailored for intelligent analysis tasks, such as Image/Video Coding for Machines (ICM/VCM). These approaches have shown remarkable results when compared to human perception-based coding methods. However, point cloud coding methods for machine intelligence have not been extensively studied. Inspired by current LiDAR point cloud intelligent analysis methods which convert point cloud into bird's-eye-view (BEV) perspective, we propose an end-to-end learnt point cloud coding framework for 3D machine intelligent tasks with BEV representation, named PC4M. Specifically, the PC4M system consists of a LiDAR encoder, a learnt BEV feature codec and a BEV region proposal network with a task-specific head. To achieve better rate-distortion performance for analysis tasks, we propose an efficient Res-NeXt fusion block with powerful multi-scale modeling ability to compress sparse BEV features, and design a long-distance adaptive attention module by using VanAtten block. Experimental results demonstrate that our method outperforms the state-of-the-art MPEG standard Geometry-based Point Cloud Coding (G-PCC) on the object detection and BEV map segmentation by 83.12% and 85.32% of BD-rate gain on nuScenes, respectively. To the best of our knowledge, this is the first end-to-end learnt task-oriented point cloud codec.
Zining WangWei ZhanMasayoshi Tomizuka
Li LiangNaveed AkhtarJordan ViceAjmal Mian