Recently, convolutional neural networks (CNNs) have made a big splash in the field of semantic segmentation, achieving very high segmentation accuracy. In order to meet the requirement of real-time inference, existing methods increase inference speed by reducing the image resolution, leading to lower segmentation performance. We propose in this work a multi-level feature fusion network referred to as MLFFNet that utilizes a novel deep neural network architecture for efficient and real-time semantic segmentation. To strike a balance between speed and performance, MLFFNet substantially reduces the computational complexity by using a lightweight feature extraction network to implement feature reuse through multi-level feature fusion. In addition, MLFFNet targets at excellent segmentation performance through a channel attention mechanism and dilated convolutions with different rates. Specifically, MLFFNet achieves 72.6% mIoU on Cityscapes with the speed of 68.3 FPS on one NVIDIA Titan X card, which is significantly faster than the existing methods with comparable performance.
Boxiang ZhangWenhui LiYuming HuiJiayun LiuYuanyuan Guan
Tanmay SinghaDuc-Son PhamAneesh KrishnaTom Gedeon
Xiaochuan MaZhao XunBomin MaoYijie XunHongzhi Guo