Mingwei YaoKehua GuoLingyan ZhangXuyang TanXiaokang Zhou
Manual annotation for crowd counting remains labor-intensive and costly. Although existing semi-supervised methods partially alleviate this burden, they still face significant challenges regarding the quality of generated pseudo-labels and the utilization of unlabeled data. To address these issues, we propose a novel semi-supervised crowd counting framework, called Point-Adaptive Teacher (PAT). This framework integrates Adaptive Soft Threshold (AST) and contrastive learning to enhance pseudo-label quality and effectively leverage unlabeled data. Specifically, we employ the Swin Transformer as the backbone and develop Swin-P2PNet, which captures global contextual information through hierarchical window attention, improving the accuracy of pseudo-labels. Additionally, we design the AST that dynamically adjusts the sample loss weight by combining confidence and uncertainty predictions, thereby alleviating the effect of noise in pseudo-labels. Finally, we introduce a contrastive learning strategy requiring no extra parameters. This strategy enhances the model's ability to learn latent representations from unlabeled data. Extensive experiments have been conducted on three public datasets, namely ShanghaiTech, JHU-Crowd++, and UCF-QNRF. The results demonstrate that our method achieves performance comparable to state-of-the-art methods.
Feng MinLinlin HaoYonggang Kuang
Bo LiYong ZhangHaihui XuBaocai Yin
Xin DengSongjian ChenYifan ChenJie-Fang Xu
Xing WeiYunfeng QiuZhiheng MaXiaopeng HongYihong Gong
Shiwei ZhangWei KeShuai LiuXiaopeng HongTong Zhang