Abstract

Retrieving specific vehicle tracks by Natural Language (NL)-based descriptions is a convenient way to monitor vehicle movement patterns and traffic-related events. NL-based image retrieval has several applications in smart cities, traffic control, etc. In this work, we propose TIED, a text-to-image encoder-decoder model for the simultaneous extraction of visual and textual information for vehicle track retrieval. The model consists of an encoder network that enforces the two modalities into a common latent space and a decoder network that performs an inverse mapping to the text descriptions. The method exploits visual semantic attributes of a target vehicle along with a cycle-consistency loss. The proposed method employs both intra-modal and inter-modal relationships to improve retrieval performance. Our system yields competitive performance achieving the 7th position in the Natural Language-Based Vehicle Retrieval public track of the 2021 NVIDIA AI City Challenge. We demonstrate that the proposed TIED model obtains six times higher Mean Reciprocal Rank (MRR) than the baseline, achieving an MRR of 15.48. The code and models will be made publicly available.

Keywords:
Computer science Encoder Mean reciprocal rank Consistency (knowledge bases) Image retrieval Information retrieval Natural language Artificial intelligence Computer vision Image (mathematics) Data mining

Metrics

7
Cited By
0.61
FWCI (Field Weighted Citation Impact)
58
Refs
0.68
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Retrieval and Classification Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

A unified cycle-consistent neural model for text and image retrieval

Marcella CorniaLorenzo BaraldiHamed R. TavakoliRita Cucchiara

Journal:   Multimedia Tools and Applications Year: 2020 Vol: 79 (35-36)Pages: 25697-25721
JOURNAL ARTICLE

Correlation Encoder-Decoder Model for Text Generation

Xu ZhangYifeng LiXueping PengXinxiao QiaoHui ZhangWenpeng Lü

Journal:   2022 International Joint Conference on Neural Networks (IJCNN) Year: 2022 Pages: 1-7
JOURNAL ARTICLE

Unsupervised Image Hashing Using a Deep Convolutional Encoder-Decoder Model for Fast Image Retrieval

Enver Akbacak

Journal:   Afyon Kocatepe University Journal of Sciences and Engineering Year: 2023 Vol: 23 (6)Pages: 1458-1465
JOURNAL ARTICLE

Asymmetric bi-encoder for image–text retrieval

Wei XiongHaoliang LiuSiya MiYu Zhang

Journal:   Multimedia Systems Year: 2023 Vol: 29 (6)Pages: 3805-3818
© 2026 ScienceGate Book Chapters — All rights reserved.