Ruyu LiuLin WangZhihao YuHaoyu ZhangXiufeng LiuBo SunXiaoguang HuoJianhua Zhang
Abstract Efficient semantic segmentation on edge devices is critical for cloud-edge perception, yet achieving high accuracy under resource constraints remains challenging. To alleviate this problem, this paper presents SCTNet-NAS, a novel framework for efficient edge segmentation via cloud-edge co-optimization. SCTNet-NAS that unifies multi-objective neural architecture search (NAS), feature-based knowledge distillation from a cloud-based vision transformer (ViT) teacher, and specialized decoder design to simultaneously deliver high accuracy and real-time efficiency for semantic segmentation on resource-constrained edge devices. The method first constructs a weight-sharing supernet and applies an non-dominated sorting genetic algorithm (NSGA-II) to explore candidate encoders in a single forward pass, then transfers global context from a vision transformer teacher to each candidate via the VitGuidance feature-level distillation scheme. In addition, our meticulously designed SCTHead and AU_SCTHead decoder modules add adaptive channel re-calibration to enhance segmentation performance and refined boundary delineation. Evaluation demonstrates SCTNet-NAS achieves a significantly enhanced accuracy-efficiency trade-off versus state-of-the-art methods, enabling high-performance edge AI perception.
Haodong LuMiao DuXiaoming HeKai QianJianli ChenYanfei SunKun Wang