JOURNAL ARTICLE

Lightweight MSW-YOLOv8n-Seg: the instance segmentation of maturity on cherry tomato with improved YOLOv8n-Seg

ronghui miaoZhiwei Li

Year: 2026 Journal:   Frontiers in Plant Science Vol: 16   Publisher: Frontiers Media

Abstract

Introduction Automatic and accurate segmentation of cherry tomato maturity in natural environment is the foundation for automatic picking. Lacking of significant differences in adjacent maturity and the problem of mutual occlusion between fruits usually affect the picking process. According to the changes in phenotypic characteristics of cherry tomato during its mature period and the Chinese national standard GH/T 1193-2021, a lightweight maturity instance segmentation method of cherry tomato with 5 levels, including green, turning, pink, light red and red was proposed based on improved YOLOv8n-Seg model, named as MobileViTv3-SK-WIoU-YOLOv8n-Seg (MSW-YOLOv8n-Seg). Methods In this model, MobileViTv3 was introduced into the original YOLOv8 model as backbone for feature extraction to reduce the parameters of the original model; selective kernel (SK) attention module was added to the neck part to improve the feature expression ability of the model; the complete intersection over union (CIoU) loss function in the original head part was replaced with wise intersection over union (WIoU), which can effectively filter low-quality samples and improve the stability and reliability of the model in complex scenes. The proposed model can better balance the relationship between segmentation speed, accuracy, and model computational complexity. Results The experimental results show that the bounding box precision, recall and mean average precision (mAP)@0.5 of the improved model on the test sets were 90.8%, 86.3% and 83.9% respectively, and the model size was 6.0 MB. Compared with YOLOv7-Mask, YOLOv8n-Seg, YOLOv9s-Seg, YOLO11n-Seg, Mask R-CNN (Mask region-based convolutional neural network) and Mask2Former, the bounding box precision increased by 9.6%, 5.2%, 5.7%, 12.3%, 13.3% and 5.0%, the recall increased by 7.8%, 7.4%, 8.8%, 13.1%, 13.9% and 0.1%, and the [email protected] increased by 10.5%, 3.0%, 0.9%, 15.0%, 13.8% and 1.4% respectively. In terms of inference speed, the MSW-YOLOv8n-Seg has the highest inference speed, with FPS of up to 52.9 f·s -1 and latency of only 18.2ms, which demonstrates its real-time processing capability. Discussion The results show that the improved MSW-YOLOv8n-Seg model is optimal, and it suitable for instance segmentation scenarios with high real-time performance and can provide effective exploration for automated cherry tomato fruit picking.

Keywords:
Segmentation Minimum bounding box Pattern recognition (psychology) Kernel (algebra) Intersection (aeronautics) Bounding overwatch Feature (linguistics) Stability (learning theory)

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
46
Refs
0.50
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Smart Agriculture and AI
Life Sciences →  Agricultural and Biological Sciences →  Plant Science
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Data and IoT Technologies
Physical Sciences →  Engineering →  Electrical and Electronic Engineering
© 2026 ScienceGate Book Chapters — All rights reserved.