An end-to-end chained network with multi-feature fusion is proposed for the trade-off of tracking speed and accuracy, which integrates target detection, feature extraction and data association into a framework. It chains paired bounding boxes estimated from overlapping nodes by IOU (Intersection Over Union) matching, whose each node covers two adjacent frames. Besides, the bidirectional feature pyramid that includes two aggregation paths is presented for multi-feature fusion, in which deformable convolution V2 is applied. Decreasing sample imbalance and gradient contribution difference, focal loss and BalancedL1 Loss form multi-task learning loss. The results on MOT17 dataset indicate that the model achieve superior tracking speed (21.6FPS) and accuracy (69.6MOTA, 81.0MOTP).
ZHOU Haiyun, XIANG Xuezhi, WANG Xinyao, REN Wenkai
Ce ZhangChengjie ZhangYiluan GuoLingji ChenMichael Happold
Congrui WangTiantian WangNan JiangShanzhi GuLong Lan