Jianwu LiKaiyue ShiGuo-Sen XieXiaofeng LiuJian ZhangTianfei Zhou
The goal of this paper is to alleviate the training cost for few-shot semantic segmentation (FSS) models. Despite that FSS in nature improves model generalization to new concepts using only a handful of test exemplars, it relies on strong supervision from a considerable amount of labeled training data for base classes. However, collecting pixel-level annotations is notoriously expensive and time-consuming, and small-scale training datasets convey low information density that limits test-time generalization. To resolve the issue, we take a pioneering step towards label-efficient training of FSS models from fully unlabeled training data, or additionally a few labeled samples to enhance the performance. This motivates an approach based on a novel unsupervised meta-training paradigm. In particular, the approach first distills pre-trained unsupervised pixel embedding into compact semantic clusters from which a massive number of pseudo meta-tasks is constructed. To mitigate the noise in the pseudo meta-tasks, we further advocate a robust Transformer-based FSS model with a novel prototype-based cross-attention design. Extensive experiments have been conducted on two standard benchmarks, i.e., PASCAL-5i and COCO-20i, and the results show that our method produces impressive performance without any annotations, and is comparable to fully supervised competitors even using only 20% of the annotations. Our code is available at: https://github.com/SSSKYue/UMTFSS.
Ayyappa Kumar PambalaTitir DuttaSoma Biswas
Xiang LiZhuoming XuQi XuYan Tang
Haohan WangLiang LiuWuhao ZhangJiangning ZhangZhenye GanYabiao WangShuo WangHaoqian Wang
Pasquale De MarinisNicola FanelliRaffaele ScaringiEmanuele ColonnaGiuseppe FiameniGennaro VessioGiovanna Castellano
Jonathan PerryAmanda Fernandez