Weakly supervised semantic segmentation aims to achieve segmentation performance comparable to fully supervised methods through low-cost annotation forms such as image level labels or bounding boxes. This article systematically reviews two types of weakly supervised learning methods based on image level labels and bounding box supervision. For image level label supervision, mainstream methods generate initial seed regions through Class Activation Mapping (CAM) and use pixel correlation expansion or iterative optimization strategies (such as erasure and adversarial training) to solve the problem of CAM only covering discriminative regions; Representative works such as SEC and AE-PSL improve segmentation integrity by introducing significance constraints or self -training mechanisms. For bounding box supervision, BoxSup et al. have demonstrated that instance segmentation frameworks such as CRF refinement or Mask RCNN can effectively utilize intra box coordinate information to generate high-quality pseudo labels, while recent work such as BBAM has explored intra box pixel level relationships through attention mechanisms. Furthermore, this article compares the efficiency performance trade-off between two types of supervised signals: image level label annotation has the lowest cost but relies on complex post-processing. Experiments have shown that hybrid supervised methods combining multi-stage self-training and cross modal consistency constraints, such as SDI-MTL, can significantly narrow the performance gap between weakly supervised and fully supervised methods. Future directions include exploring noise robust label propagation mechanisms and weakly supervised learning frameworks for cross task collaboration.
Xiang WeikangQuan ZhouCui JingchengMo ZhiyiXiaofu WuWeihua OuWang JingdongWenyu Liu百度,北京 100085 Baidu, Beijing 100085,China
Jizhi ZhangGuoying ZhangQiangyu WangShuang Bai
Binxiu LiangYan LiuLinxi HeJiangyun Li
Ci-Siang LinChien-Yi WangYu-Chiang Frank WangMin-Hung Chen