Query dependent pseudo-relevance feedback based on wikipedia

Yang Xu; Gareth J. F. Jones; Bin Wang

doi:10.1145/1571941.1571954

ScienceGate Book Chapters

JOURNAL ARTICLE

Query dependent pseudo-relevance feedback based on wikipedia

Yang Xu Gareth J. F. Jones Bin Wang

Year: 2009 Pages: 59-66

DOI: 10.1145/1571941.1571954

Get Full-Text PDF Get Analytical Report

Abstract

Pseudo-relevance feedback (PRF) via query-expansion has \nbeen proven to be effective in many information retrieval \n(IR) tasks. In most existing work, the top-ranked documents \nfrom an initial search are assumed to be relevant and used for PRF. One problem with this approach is that one or more \nof the top retrieved documents may be non-relevant, which \ncan introduce noise into the feedback process. Besides, ex- \nisting methods generally do not take into account the signicantly different types of queries that are often entered into an IR system. Intuitively, Wikipedia can be seen as a large, manually edited document collection which could be exploited to improve document retrieval effectiveness within PRF. It is not obvious how we might best utilize information from Wikipedia in PRF, and to date, the potential of Wikipedia for this task has been largely unexplored. In our work, we present a systematic exploration of the utilization of Wikipedia in PRF for query dependent expansion. Specifically, we classify TREC topics into three categories based on Wikipedia: 1) entity queries, 2) ambiguous queries, and 3) broader queries. We propose and study the effectiveness of three methods for expansion term selection, each modeling the Wikipedia based pseudo-relevance information from a different perspective. We incorporate the expansion terms into the original query and use language modeling IR to evaluate these methods. Experiments on four TREC test collections, including the large web collection GOV2, show that retrieval performance of each type of query can be improved. In addition, we demonstrate that the proposed method out-performs the baseline relevance model in terms of precision and robustness.

Keywords:

Computer science Information retrieval Query expansion Relevance feedback Relevance (law) Selection (genetic algorithm) Task (project management) Process (computing) Web search query Query language Search engine Artificial intelligence Image retrieval

Metrics

215

Cited By

42.16

FWCI (Field Weighted Citation Impact)

Refs

1.00

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Information Retrieval and Search Behavior

Physical Sciences → Computer Science → Information Systems

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Query dependent pseudo-relevance feedback based on wikipedia

Abstract

Metrics

Citation History

Topics

Related Documents

Pseudo-relevance feedback query based on Wikipedia

Query expansion using pseudo relevance feedback on wikipedia

Query expansion using pseudo relevance feedback based on the bahasa version of the wikipedia dataset

Evaluation of Pseudo-Relevance Feedback using Wikipedia

Pseudo-relevance feedback based query expansion using boosting algorithm