JOURNAL ARTICLE

Parallel Multi-Scale Feature Fusion Transformer for Object Tracking

Abstract

In recent years, the application of Transformer in the realm of object tracking has yielded promising results, enabling researchers to extract comprehensive global information from image features. Nonetheless, it has come to light that the current state-of-the-art Transformer networks exhibit limitations in acquiring pertinent local information from image features. To address this issue, this paper presents a novel framework, namely the parallel multi-scale feature fusion Transformer (PFOT). The primary contributions of this study can be summarized as follows: 1) the incorporation of parallel multi-scale feature fusion involving both Convolutional Neural Networks (CNN) and Transformers, aimed at mitigating the deficiency of existing Transformer-based tracking networks in capturing local information; 2) the adoption of a progressive fusion approach to effectively reconcile the complexities arising from spatial disparities inherent in multi-scale features; 3) the pivotal role played by this fusion strategy in enhancing the network's overall performance. To evaluate the performance of our approach, extensive experiments have been conducted on three demanding datasets, attesting to the superiority of our PFOT network over state-of-the-art methods. Our experimental results on several benchmarks, including NFS, OTB100, and UAV123, demonstrate that PFOT achieves comparable performance to the state-of-the-art tracking algorithms.

Keywords:
Computer science Artificial intelligence Computer vision Fusion Transformer Tracking (education) Video tracking Scale (ratio) Feature (linguistics) Feature tracking Object (grammar) Pattern recognition (psychology) Engineering Cartography Electrical engineering Voltage Geography

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
16
Refs
0.11
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Infrared Target Detection Methodologies
Physical Sciences →  Engineering →  Aerospace Engineering
Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Face and Expression Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.