An efficient object tracking based on multi‐head cross‐attention transformer

Jiahai Dai; Huimin Li; Shan Jiang; Hongwei Yang

doi:10.1111/exsy.13650

ScienceGate Book Chapters

JOURNAL ARTICLE

An efficient object tracking based on multi‐head cross‐attention transformer

Jiahai Dai Huimin Li Shan Jiang Hongwei Yang

Year: 2024 Journal: Expert Systems Vol: 42 (2) Publisher: Wiley

DOI: 10.1111/exsy.13650

Get Full-Text PDF Get Analytical Report

Abstract

Abstract Object tracking is an essential component of computer vision and plays a significant role in various practical applications. Recently, transformer‐based trackers have become the predominant method for tracking due to their robustness and efficiency. However, existing transformer‐based trackers typically focus solely on the template features, neglecting the interactions between the search features and the template features during the tracking process. To address this issue, this article introduces a multi‐head cross‐attention transformer for visual tracking (MCTT), which effectively enhance the interaction between the template branch and the search branch, enabling the tracker to prioritize discriminative feature. Additionally, an auxiliary segmentation mask head has been designed to produce a pixel‐level feature representation, enhancing and tracking accuracy by predicting a set of binary masks. Comprehensive experiments have been performed on benchmark datasets, such as LaSOT, GOT‐10k, UAV123 and TrackingNet using various advanced methods, demonstrating that our approach achieves promising tracking performance. MCTT achieves an AO score of 72.8 on the GOT‐10k.

Keywords:

Computer science Transformer Artificial intelligence Computer vision Video tracking Tracking (education) Head (geology) Object (grammar) Electrical engineering Voltage

Metrics

Cited By

0.53

FWCI (Field Weighted Citation Impact)

Refs

0.53

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Video Surveillance and Tracking Methods

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Visual Attention and Saliency Detection

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Fire Detection and Safety Systems

Physical Sciences → Engineering → Safety, Risk, Reliability and Quality

An efficient object tracking based on multi‐head cross‐attention transformer

Abstract

Metrics

Citation History

Topics

Related Documents

Improving the Execution Speed of Transformer-based Object Tracking Models through Multi-head Attention Parallelization

Multi-Head-Self-Attention based YOLOv5X-transformer for multi-scale object detection

SMSTracker: A Self-Calibration Multi-Head Self-Attention Transformer for Visual Object Tracking

Siamese Network Based on MLP and Multi-head Cross Attention for Visual Object Tracking

Efficient Visual Tracking via Hierarchical Cross-Attention Transformer