Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents

Weiwei Sun; Lingyong Yan; Xinyu Ma; Shuaiqiang Wang; Pengjie Ren; Zhumin Chen; Dawei Yin; Zhaochun Ren

doi:10.18653/v1/2023.emnlp-main.923

ScienceGate Book Chapters

JOURNAL ARTICLE

Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents

Weiwei Sun Lingyong Yan Xinyu Ma Shuaiqiang Wang Pengjie Ren Zhumin Chen Dawei Yin Zhaochun Ren

Year: 2023

DOI: 10.18653/v1/2023.emnlp-main.923

Get Full-Text PDF Get Analytical Report

Abstract

Large Language Models (LLMs) have demonstrated remarkable zero-shot generalization across various language-related tasks, including search engines. However, existing work utilizes the generative ability of LLMs for Information Retrieval (IR) rather than direct passage ranking. The discrepancy between the pre-training objectives of LLMs and the ranking objective poses another challenge. In this paper, we first investigate generative LLMs such as ChatGPT and GPT-4 for relevance ranking in IR. Surprisingly, our experiments reveal that properly instructed LLMs can deliver competitive, even superior results to state-of-the-art supervised methods on popular IR benchmarks. Furthermore, to address concerns about data contamination of LLMs, we collect a new test set called NovelEval, based on the latest knowledge and aiming to verify the model's ability to rank unknown knowledge. Finally, to improve efficiency in real-world applications, we delve into the potential for distilling the ranking capabilities of ChatGPT into small specialized models using a permutation distillation scheme. Our evaluation results turn out that a distilled 440M model outperforms a 3B supervised model on the BEIR benchmark. The code to reproduce our results is available at www.github.com/sunnweiwei/RankGPT.

Keywords:

Ranking (information retrieval) Computer science Benchmark (surveying) Language model Relevance (law) Machine learning Artificial intelligence Set (abstract data type) Generative model Code (set theory) Rank (graph theory) Generalization Information retrieval Generative grammar Mathematics

Metrics

150

Cited By

38.32

FWCI (Field Weighted Citation Impact)

Refs

1.00

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Expert finding and Q&A systems

Physical Sciences → Computer Science → Information Systems

Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents

Abstract

Metrics

Citation History

Topics

Related Documents

Enhancing Recommendation Diversity by Re-ranking with Large Language Models

EcoRank: Budget-Constrained Text Re-ranking Using Large Language Models

Re-ranking search results using language models of query-specific clusters

Clinical trial search: Using biomedical language understanding models for re-ranking

Re-ranking models for spoken language understanding