JOURNAL ARTICLE

YolTrack: Multitask Learning Based Real-Time Multiobject Tracking and Segmentation for Autonomous Vehicles

Xuepeng ChangHuihui PanWeichao SunHuijun Gao

Year: 2021 Journal:   IEEE Transactions on Neural Networks and Learning Systems Vol: 32 (12)Pages: 5323-5333   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Modern autonomous vehicles are required to perform various visual perception tasks for scene construction and motion decision. The multiobject tracking and instance segmentation (MOTS) are the main tasks since they directly influence the steering and braking of the car. Implementing both tasks using a multitask learning neural network presents significant challenges in performance and complexity. Current work on MOTS devotes to improve the precision of the network with a two-stage tracking by detection model, which is difficult to satisfy the real-time requirement of autonomous vehicles. In this article, a real-time multitask network named YolTrack based on one-stage instance segmentation model is proposed to perform the MOTS task, achieving an inference speed of 29.5 frames per second (fps) with slight accuracy and precision drop. The YolTrack uses ShuffleNet V2 with feature pyramid network (FPN) as a backbone, from which two decoders are extended to generate instance segments and embedding vectors. Segmentation masks are used to improve the tracking performance by performing logic AND operation with feature maps, proving that foreground segmentation plays an important role in object tracking. The different scales of multiple tasks are balanced by the optimized geometric mean loss during the training phase. Experimental results on the KITTI MOTS data set show that YolTrack outperforms other state-of-the-art MOTS architectures in real-time aspect and is appropriate for deployment in autonomous vehicles.

Keywords:
Computer science Artificial intelligence Segmentation Computer vision Feature (linguistics) Task (project management) Inference Video tracking Multi-task learning Embedding Object (grammar) Pattern recognition (psychology) Engineering

Metrics

41
Cited By
3.07
FWCI (Field Weighted Citation Impact)
63
Refs
0.92
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Air Quality Monitoring and Forecasting
Physical Sciences →  Environmental Science →  Environmental Engineering
© 2026 ScienceGate Book Chapters — All rights reserved.