Emmanuel Asiedu BrempongSimon KornblithTing ChenNiki ParmarMatthias MindererMohammad Norouzi
Semantic segmentation labels are expensive and time consuming to acquire. To improve label efficiency of semantic segmentation models, we revisit denoising autoencoders and study the use of a denoising objective for pretraining UNets. We pretrain a Transformer-based UNet as a denoising autoencoder, followed by fine-tuning on semantic segmentation using few labeled examples. Denoising pretraining outperforms training from random initialization, and even supervised ImageNet-21K pretraining of the encoder when the number of labeled images is small. A key advantage of denoising pretraining over supervised pretraining of the backbone is the ability to pretrain the decoder, which would otherwise be randomly initialized. We thus propose a novel Decoder Denoising Pretraining (DDeP) method, in which we initialize the encoder using supervised learning and pretrain only the decoder using the denoising objective. Despite its simplicity, DDeP achieves state-of-the-art results on label-efficient semantic segmentation, offering considerable gains on the Cityscapes, Pascal Context, and ADE20K datasets.
Bowen ShiXiaopeng ZhangYaoming WangWenrui DaiJunni ZouHongkai Xiong
Rahul Namadev ChavanP. Aswathy
Çağla ÇöpürkayaElif MeriçFatma Patlar AkbulutCagatay Catal
Michail TarasiouRıza Alp GülerStefanos Zafeiriou
Feng WangHuiyu WangChen WeiAlan YuilleWei Shen