Vision-based perception is an import link in driverless technology, and semantic segmentation is one of the mostly uses atrous convolution with a large amount of computation and high memory consumption to extract high-resolution feature maps. As a result, the current mainstream semantic segmentation network lacks the segmentation speed and cannot be effectively applied in driverless technology. To solve this problem, a semantic segmentation network with better real-time performance is proposed. Firstly, a lightweight convolutional neural network is used as the encoder, and stride convolution and regular convolution are used to replace the time-consuming and memory-consuming atrous convolution. Secondly, in order to obtain the feature map similar to Deeplab v3+, a new joint upsampling module, Multi-scale Feature Map Joint Pyramid Upsamping(MJPU), is proposed. By fusing multiple feature maps in the encoder, a high resolution feature map with richer semantic information is generated. Experments on the Cityscapes dataset show that, compared with the popular semantic segmentation network Deeplab V3+, the proposed network can improve the segmentation speed to 31.4 FPS/s without losing a lot of performance(mIoU of 44.03%). Therefore, the network proposed in this paper has better real-time performance and is more suitable for driverless scenes.
Nadeem AtifSaquib MazharShaik Rafi AhamedM. K. BhuyanSultan AlfarhoodMejdl Safran