Background-Enhanced Visual Prompting Transformer for Generalized Few-Shot Semantic Segmentation

Man Li; Xiaodong Ma

doi:10.3390/electronics14071389

ScienceGate Book Chapters

JOURNAL ARTICLE

Background-Enhanced Visual Prompting Transformer for Generalized Few-Shot Semantic Segmentation

Man Li Xiaodong Ma

Year: 2025 Journal: Electronics Vol: 14 (7)Pages: 1389-1389 Publisher: Multidisciplinary Digital Publishing Institute

DOI: 10.3390/electronics14071389

Get Full-Text PDF Get Analytical Report

Abstract

Generalized few-shot semantic segmentation (GFSS), which requires strong segmentation performance on novel classes while retaining the performance on base classes, is attracting increasing attention. Recent studies have demonstrated the effectiveness of applying visual prompts to solve GFSS problems, but there are still unresolved issues. Due to the confusion between the backgrounds and novel classes foreground during base class pre-training, the learned base visual prompts will mislead the novel visual prompts during novel class fine-tuning, leading to sub-optimal results. This paper proposes a background-enhanced visual prompting Transformer (Beh-VPT) to solve the problem. Specifically, we innovatively propose background visual prompts, which can learn potential novel class information in the background during base class pre-training and transfer the information to novel visual prompts during novel class fine-tuning via our proposed Hybrid Causal Attention Module. Additionally, we propose a background-enhanced segmentation head that is used in conjunction with background prompts to enhance the model’s capacity for learning novel classes. Considering the GFSS settings that take into account both base and novel classes, we introduce Singular Value Fine-Tuning in the non-meta learning paradigm to further unleash the full potential of the model. Extensive experiments show that the proposed method achieves state-of-the-art performance for GFSS on PASCAL-5i and COCO-20i datasets. For example, considering both base and novel classes, the improvements in mIoU range from 0.47% to 1.08% (COCO-20i) in the one-shot and five-shot scenarios, respectively. In addition, our method does not cause a fallback of mIoU in base classes relative to the baseline.

Keywords:

Segmentation Transformer Computer science Artificial intelligence Computer vision Shot (pellet) Computer graphics (images) Engineering Materials science Electrical engineering Voltage

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.07

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Background-Enhanced Visual Prompting Transformer for Generalized Few-Shot Semantic Segmentation

Abstract

Metrics

Topics

Related Documents

Generalized Few-shot Semantic Segmentation

Adaptive Semantic Anchoring for Enhanced Generalized Few-Shot Semantic Segmentation

Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach

Dynamic Category Queries Transformer for Generalized Few-shot Semantic Segmentation

Prediction Calibration for Generalized Few-Shot Semantic Segmentation