Referring Multi-Object Tracking

Dongming Wu; Wencheng Han; Tiancai Wang; Xingping Dong; Xiangyu Zhang; Jianbing Shen

doi:10.1109/cvpr52729.2023.01406

ScienceGate Book Chapters

JOURNAL ARTICLE

Referring Multi-Object Tracking

Dongming Wu Wencheng Han Tiancai Wang Xingping Dong Xiangyu Zhang Jianbing Shen

Year: 2023 Pages: 14633-14642

DOI: 10.1109/cvpr52729.2023.01406

Get Full-Text PDF Get Analytical Report

Abstract

Existing referring understanding tasks tend to involve the detection of a single text-referred object. In this paper, we propose a new and general referring understanding task, termed referring multi-object tracking (RMOT). Its core idea is to employ a language expression as a semantic cue to guide the prediction of multi-object tracking. To the best of our knowledge, it is the first work to achieve an arbitrary number of referent object predictions in videos. To push forward RMOT, we construct one benchmark with scalable expressions based on KITTI, named Refer-KITTI. Specifically, it provides 18 videos with 818 expressions, and each expression in a video is annotated with an average of 10.7 objects. Further, we develop a transformer-based architecture TransRMOT to tackle the new task in an online manner, which achieves impressive detection performance and out-performs other counterparts. The Refer-KITTI dataset and the code are released at https://referringmot.github.io.

Keywords:

Computer science Artificial intelligence Construct (python library) Task (project management) Scalability Benchmark (surveying) Video tracking Object (grammar) Transformer Object detection Referent Expression (computer science) Computer vision Natural language processing Machine learning Segmentation Programming language

Metrics

Cited By

12.37

FWCI (Field Weighted Citation Impact)

Refs

0.99

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Speech and dialogue systems

Physical Sciences → Computer Science → Artificial Intelligence

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Referring Multi-Object Tracking

Abstract

Metrics

Citation History

Topics

Related Documents

Cross-View Referring Multi-Object Tracking

Cognitive Disentanglement for Referring Multi-Object Tracking

ReferGPT: Towards Zero-Shot Referring Multi-Object Tracking

MEX: Memory-Efficient Approach to Referring Multi-Object Tracking

EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous Driving