Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving

Ling‐Dong Kong; Xiang Xu; Jiawei Ren; Wenwei Zhang; Liang Pan; Kai Chen; Wei Tsang Ooi; Ziwei Liu

doi:10.1109/tpami.2025.3535625

ScienceGate Book Chapters

JOURNAL ARTICLE

Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving

Ling‐Dong Kong Xiang Xu Jiawei Ren Wenwei Zhang Liang Pan Kai Chen Wei Tsang Ooi Ziwei Liu

Year: 2025 Journal: IEEE Transactions on Pattern Analysis and Machine Intelligence Vol: 47 (5)Pages: 3748-3765 Publisher: IEEE Computer Society

DOI: 10.1109/tpami.2025.3535625

Get Full-Text PDF Get Analytical Report

Abstract

Efficient data utilization is crucial for advancing 3D scene understanding in autonomous driving, where reliance on heavily human-annotated LiDAR point clouds challenges fully supervised methods. Addressing this, our study extends into semi-supervised learning for LiDAR semantic segmentation, leveraging the intrinsic spatial priors of driving scenes and multi-sensor complements to augment the efficacy of unlabeled datasets. We introduce LaserMix++, an evolved framework that integrates laser beam manipulations from disparate LiDAR scans and incorporates LiDAR-camera correspondences to further assist data-efficient learning. Our framework is tailored to enhance 3D scene consistency regularization by incorporating multi-modality, including 1) multi-modal LaserMix operation for fine-grained cross-sensor interactions; 2) camera-to-LiDAR feature distillation that enhances LiDAR feature learning; and 3) language-driven knowledge guidance generating auxiliary supervisions using open-vocabulary models. The versatility of LaserMix++ enables applications across LiDAR representations, establishing it as a universally applicable solution. Our framework is rigorously validated through theoretical analysis and extensive experiments on popular driving perception datasets. Results demonstrate that LaserMix++ markedly outperforms fully supervised alternatives, achieving comparable accuracy with five times fewer annotations and significantly improving the supervised-only baselines. This substantial advancement underscores the potential of semi-supervised approaches in reducing the reliance on extensive labeled data in LiDAR-based 3D scene understanding systems.

Keywords:

Computer science Artificial intelligence Computer vision Modal

Metrics

Cited By

62.05

FWCI (Field Weighted Citation Impact)

109

Refs

0.99

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Image Processing and 3D Reconstruction

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

3D Shape Modeling and Analysis

Physical Sciences → Engineering → Computational Mechanics

Robotics and Sensor-Based Localization

Physical Sciences → Engineering → Aerospace Engineering

Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving

Abstract

Metrics

Citation History

Topics

Related Documents

Scene Understanding for Autonomous Driving

Auditory Scene Understanding for Autonomous Driving

Multi-Modal Sensor Fusion-Based Deep Neural Network for End-to-End Autonomous Driving With Scene Understanding

Holistic Autonomous Driving Understanding by Bird'View Injected Multi-Modal Large Models

An Efficient Multi-modal and Hyper-parametric Autonomous Driving HvDetFusion Algorithm