Zero-Shot Vehicle Model Recognition via Text-Based Retrieval-Augmented Generation

Chang, Wei-Chia; Chen, Yen-Ann

doi:10.5281/zenodo.17561145

ScienceGate Book Chapters

JOURNAL ARTICLE

Zero-Shot Vehicle Model Recognition via Text-Based Retrieval-Augmented Generation

Chang, Wei-Chia Chen, Yen-Ann

Year: 2025 Journal: Zenodo (CERN European Organization for Nuclear Research) Publisher: European Organization for Nuclear Research

DOI: 10.5281/zenodo.17561145

Get Full-Text PDF Get Analytical Report

Abstract

Vehicle make and model recognition (VMMR) is an important task in intelligent transportation systems, but existing approaches struggle to adapt to newly released models. Contrastive Language–Image Pretraining (CLIP) provides strong visual–text alignment, yet its fixed pretrained weights limit performance without costly image-specific finetuning. We propose a pipeline that integrates vision–language models (VLMs) with Retrieval-Augmented Generation (RAG) to support zero-shot recognition through text-based reasoning. A VLM converts vehicle images into descriptive attributes, which are compared against a database of textual features. Relevant entries are retrieved and combined with the description to form a prompt, and a language model (LM) infers the make and model. This design avoids large-scale retraining and enables rapid updates by adding textual descriptions of new vehicles. Experiments show that the proposed method improves recognition by nearly 20% over the CLIP baseline, demonstrating the potential of RAG-enhanced LM reasoning for scalable VMMR in smart-city applications.

Keywords:

Pipeline (software) Task (project management) Scalability Constraint (computer-aided design) Automation Task analysis Language model Limit (mathematics)

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.66

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Autonomous Vehicle Technology and Safety

Physical Sciences → Engineering → Automotive Engineering

Zero-Shot Vehicle Model Recognition via Text-Based Retrieval-Augmented Generation

Abstract

Metrics

Topics

Related Documents

Zero-Shot Vehicle Model Recognition via Text-Based Retrieval-Augmented Generation

Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation

Robust Retrieval Augmented Generation for Zero-shot Slot Filling

Robust Retrieval Augmented Generation for Zero-shot Slot Filling

Retrieval-Augmented Few-shot Text Classification