Semantic segmentation of images promises numerous benefits for augmented reality applications. However, in such applications typical scenes are challenging for current segmentation algorithms due to high variability in object appearances and distribution. We propose a new cascaded loss fusion strategy to improve the training schedule of state-of-the-art realtime RGB-D semantic segmentation architectures. We employ methods developed in the context of multi-task learning to solve the multiclass and multi-loss learning problems in semantic segmentation. Through our quantitative evaluation on the NYUv2 [3] and SUNRGB-D [4] benchmark datasets, we show improvement over the state-of-the-art approaches. Furthermore, our approach improves results qualitatively on both the benchmark datasets as well as on our own recordings of some scenarios that are typical for head-mounted cameras.
Zongwei WUZhuyun ZhouGuillaume AllibertChristophe StolzCédric DemonceauxChao Ma
Zongwei WuZhuyun ZhouGuillaume AllibertChristophe StolzCédric DemonceauxChao Ma
Shiyi JiangYang XuDanyang LiRunze Fan
Xingchao YanSujuan HouAwudu KarimWeikuan Jia