Introduction Automatic and accurate segmentation of cherry tomato maturity in natural environment is the foundation for automatic picking. Lacking of significant differences in adjacent maturity and the problem of mutual occlusion between fruits usually affect the picking process. According to the changes in phenotypic characteristics of cherry tomato during its mature period and the Chinese national standard GH/T 1193-2021, a lightweight maturity instance segmentation method of cherry tomato with 5 levels, including green, turning, pink, light red and red was proposed based on improved YOLOv8n-Seg model, named as MobileViTv3-SK-WIoU-YOLOv8n-Seg (MSW-YOLOv8n-Seg). Methods In this model, MobileViTv3 was introduced into the original YOLOv8 model as backbone for feature extraction to reduce the parameters of the original model; selective kernel (SK) attention module was added to the neck part to improve the feature expression ability of the model; the complete intersection over union (CIoU) loss function in the original head part was replaced with wise intersection over union (WIoU), which can effectively filter low-quality samples and improve the stability and reliability of the model in complex scenes. The proposed model can better balance the relationship between segmentation speed, accuracy, and model computational complexity. Results The experimental results show that the bounding box precision, recall and mean average precision (mAP)@0.5 of the improved model on the test sets were 90.8%, 86.3% and 83.9% respectively, and the model size was 6.0 MB. Compared with YOLOv7-Mask, YOLOv8n-Seg, YOLOv9s-Seg, YOLO11n-Seg, Mask R-CNN (Mask region-based convolutional neural network) and Mask2Former, the bounding box precision increased by 9.6%, 5.2%, 5.7%, 12.3%, 13.3% and 5.0%, the recall increased by 7.8%, 7.4%, 8.8%, 13.1%, 13.9% and 0.1%, and the [email protected] increased by 10.5%, 3.0%, 0.9%, 15.0%, 13.8% and 1.4% respectively. In terms of inference speed, the MSW-YOLOv8n-Seg has the highest inference speed, with FPS of up to 52.9 f·s -1 and latency of only 18.2ms, which demonstrates its real-time processing capability. Discussion The results show that the improved MSW-YOLOv8n-Seg model is optimal, and it suitable for instance segmentation scenarios with high real-time performance and can provide effective exploration for automated cherry tomato fruit picking.
Mahamed Abdelmadjid AllaliNassima BousahbaHanaa Hadj KaddourAsma NedjariHalla Guetarni
C. GengAimin WangYang ChengZhiqiang XuYu XuXingguo LiuHao Zhu
Weibin WuZhaokai HeJunlin LiTianci ChenQing LuoYuanqiang LuoWeihui WuZhenbang Zhang