Wei JiaoJianghui XuYijiao FangJiaojiao HuangYujie ZhuDandan Ling
Abstract Background Dermoscopic lesion segmentation is crucial for dermatology, yet existing methods struggle to integrate global context with local details under the efficiency constraints required for clinical use. Purpose We aim to develop a lightweight model that simultaneously captures long‐range spatial dependencies and preserves fine‐grained boundary details for dermoscopic lesions. The method is designed to achieve a favorable accuracy–efficiency trade‐off, thereby improving segmentation performance and ensuring potential for practical clinical deployment. Methods Proposing a lightweight hybrid model, HCViT‐Net, featuring an encoder–decoder architecture. It incorporates a multi‐scale query transformer (MSQFormer) into each stage of its convolutional encoder to efficiently capture global, multi‐scale context. Furthermore, a wavelet‐guided attention refinement module (WARM) is introduced on the highest‐resolution skip connection to selectively enhance high‐frequency boundary details and bridge the semantic gap between the encoder and decoder, thus improving model performance. Results Evaluated on ISIC 2017 and 2018, our model achieved mean intersection‐over‐union (mIoU) of 87.76% and 87.45%, respectively. With only 5.76M parameters and 7.51 GFLOPs, it demonstrates performance competitive with existing methods at a significantly lower computational cost. Conclusions HCViT‐Net achieves an excellent accuracy–efficiency trade‐off. It improves segmentation accuracy with a low computational footprint, showing strong potential for practical deployment in dermatology workflows.
Jiang QianHaiyan LiShijun LiaoZhe XiaoWeihua LiHaofei Li
Haicheng QuYi GaoQingling JiangYing Wang
Guangzhe ZhaoXingguo ZhuXueping WangFeihu Yan
Ying WangMeng ZhangJian-An LiangMeiyan Liang