JOURNAL ARTICLE

End-to-End Chained Pedestrian Multi-Object Tracking Based on Multi-Feature Fusion

Abstract

An end-to-end chained network with multi-feature fusion is proposed for the trade-off of tracking speed and accuracy, which integrates target detection, feature extraction and data association into a framework. It chains paired bounding boxes estimated from overlapping nodes by IOU (Intersection Over Union) matching, whose each node covers two adjacent frames. Besides, the bidirectional feature pyramid that includes two aggregation paths is presented for multi-feature fusion, in which deformable convolution V2 is applied. Decreasing sample imbalance and gradient contribution difference, focal loss and BalancedL1 Loss form multi-task learning loss. The results on MOT17 dataset indicate that the model achieve superior tracking speed (21.6FPS) and accuracy (69.6MOTA, 81.0MOTP).

Keywords:
Computer science Artificial intelligence Feature (linguistics) Feature extraction Pyramid (geometry) Computer vision Bounding overwatch Tracking (education) Pedestrian detection Backbone network Object detection Matching (statistics) Convolution (computer science) Pattern recognition (psychology) Video tracking Intersection (aeronautics) End-to-end principle Fusion Pedestrian Object (grammar) Artificial neural network Mathematics Engineering

Metrics

1
Cited By
0.10
FWCI (Field Weighted Citation Impact)
15
Refs
0.40
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Fire Detection and Safety Systems
Physical Sciences →  Engineering →  Safety, Risk, Reliability and Quality
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.