Jing WangYanru WangDan YuanYing QueWeichao HuangWei Yuan
The TransT object tracking algorithm, built on Transformer architecture, effectively integrates deep feature extraction with attention mechanisms, thereby enhancing the stability and accuracy of the algorithm. However, this algorithm exhibits insufficient tracking accuracy and boundary box drift when dealing with similar background clutter, which directly affects the subsequent tracking process. To overcome this problem, this paper constructs a semantic enhancement model, which utilizes multi-layer feature representations extracted from deep networks, and correlates and fuses shallow features with deep features by using cross-attention. At the same time, in order to adapt to the changes in the surrounding environment of the object and establish good discrimination with similar objects, this paper proposes a dynamic mask strategy to optimize the attention allocation mechanism and finally employs an object template update mechanism to improve the adaptability of the model by comparing the spatio-temporal information of successive frames to update the object template in time, further enhancing its tracking performance in complex scenes. Experimental comparison results demonstrate that the algorithm proposed in this paper can effectively handle similar background clutter, leading to a significant improvement in the overall performance of the tracking model.
Wenli ZhangYitao XinChao ZhengXinyu PengJian Tai
Feng WenHaixin HuangXiangyang YinJunguang MaXiaojie Hu
Junwei HuJinlong ChenMinghao YangYiming Jiang