Multi-object tracking is a significant domain in Computer vision applications, which involves giving unique identities to different objects, and maintaining the association between them, for real-time applications. However, most of the trackers fail to achieve decent levels of accuracy as well as speed. In this paper, we propose a tracking technique, which utilizes both high-speed detections from Yolo, as well as deep feature extraction, from a convolutional neural network. The extracted features, along with position vectors and color histograms, are matched between corresponding frames, to develop an association between pedestrians. It can face issues like slight changes in object appearance in continuous frames, such as shape, size or illumination changes, partial occlusions, or re-identification of pedestrians, on re-entering the view, or after being occluded, for a certain length of frames. We have used the Yolo framework for fast object detection, a MobileNet architecture based custom CNN for feature extraction, and a set of algorithms to generate associations between frames. On the publically available Towncentre dataset, our framework can reach a MOTA of 93.2%.
Chuan‐Yu ChangYuzhang LinYou-Da Su
Honghe HuangYi XuHuang Yan-jieQian YangZhiguo Zhou
Redouane KhemmarMatthias GouveiaBenoît DecouxJean-Yves Ertaud
Xingyu LiJianming HuHantao LiuYi Zhang