Generalized few-shot semantic segmentation (GFSS), which requires strong segmentation performance on novel classes while retaining the performance on base classes, is attracting increasing attention. Recent studies have demonstrated the effectiveness of applying visual prompts to solve GFSS problems, but there are still unresolved issues. Due to the confusion between the backgrounds and novel classes foreground during base class pre-training, the learned base visual prompts will mislead the novel visual prompts during novel class fine-tuning, leading to sub-optimal results. This paper proposes a background-enhanced visual prompting Transformer (Beh-VPT) to solve the problem. Specifically, we innovatively propose background visual prompts, which can learn potential novel class information in the background during base class pre-training and transfer the information to novel visual prompts during novel class fine-tuning via our proposed Hybrid Causal Attention Module. Additionally, we propose a background-enhanced segmentation head that is used in conjunction with background prompts to enhance the model’s capacity for learning novel classes. Considering the GFSS settings that take into account both base and novel classes, we introduce Singular Value Fine-Tuning in the non-meta learning paradigm to further unleash the full potential of the model. Extensive experiments show that the proposed method achieves state-of-the-art performance for GFSS on PASCAL-5i and COCO-20i datasets. For example, considering both base and novel classes, the improvements in mIoU range from 0.47% to 1.08% (COCO-20i) in the one-shot and five-shot scenarios, respectively. In addition, our method does not cause a fallback of mIoU in base classes relative to the baseline.
Zhuotao TianXin LaiLi JiangShu LiuMichelle ShuHengshuang ZhaoJiaya Jia
Xiangcai ChenTaiping ZhangXinyu ChenJiaxuan Li
Mir Rayat Imtiaz HossainMennatullah SiamLeonid SigalJames J. Little
Zhihe LuSen HeDa LiYi-Zhe SongTao Xiang