JOURNAL ARTICLE

Pedestrian Multi-Object Tracking with Bottleneck Transformer and Enhanced Feature Fusion

Abstract

Due to balanced tracking accuracy and speed, Joint-detection-and-embedding (JDE) tracking paradigm has drawn great attention, which employs a single work to predict detection and appearance features simultaneously. Building on a strong baseline CSTrack, we replace the spatial convolutions in the final block of backbone with a Bottleneck Transformer, which models global relationships across objects and reduces the parameters. Besides, we introduce an enhanced feature fusion block with structural re-parameterization technique to augment multi-feature fusion for alleviating the contradiction between detection and identification embedding subtasks and maintaining the inference-time. The results on MOT16 and MOT17 datasets indicate that our method achieves competitive tracking results.

Keywords:
Computer science Bottleneck Artificial intelligence Embedding Inference Tracking (education) Video tracking Fusion Object detection Transformer Pattern recognition (psychology) Block (permutation group theory) Computer vision Feature (linguistics) Pedestrian Feature tracking Object (grammar) Engineering Mathematics

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
25
Refs
0.10
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Fire Detection and Safety Systems
Physical Sciences →  Engineering →  Safety, Risk, Reliability and Quality
© 2026 ScienceGate Book Chapters — All rights reserved.