Shuang LiuYing LiHuankun Sheng
Drivable area or free space detection is an essential part of the perception system of an autonomous vehicle. It helps intelligent vehicles understand road conditions and determine safe driving areas. Most of the driving area detection algorithms are based on semantic segmentation that classifies each pixel into its category, and recent advances in convolutional neural networks (CNNs) have significantly facilitated semantic segmentation in driving scenarios. Though promising results have been obtained, the existing CNN-based drivable area detection methods usually process one local neighborhood at a time. The locality of convolutional operation fails to capture long-range dependencies. To solve this problem, we propose an improved Swin Transformer based on shift window, named Multi-Swin. First, an improved patch merging strategy is proposed to enhance feature interactions between adjacent patches. Second, a decoder with upsampling layer is designed to restore the resolution of the feature map. Last, a multi-scale fusion module is utilized to improve the representation ability of global semantic and geometric information. Our method is evaluated and tested on the publicly available Cityscapes dataset. The experimental results show that our method achieves 91.92% IoU in road segmentation detection, surpassing state-of-the-art methods.
Xinru HanCai LinminQing XiangMa Lei
HU Mingyang, GUO Yan, JIN Yangshuang