JOURNAL ARTICLE

Concept-Based Information Retrieval Using Explicit Semantic Analysis

Ofer EgoziShaul MarkovitchEvgeniy Gabrilovich

Year: 2011 Journal:   ACM Transactions on Information Systems Vol: 29 (2)Pages: 1-34

Abstract

Information retrieval systems traditionally rely on textual keywords to index and retrieve documents. Keyword-based retrieval may return inaccurate and incomplete results when different keywords are used to describe the same concept in the documents and in the queries. Furthermore, the relationship between these related keywords may be semantic rather than syntactic, and capturing it thus requires access to comprehensive human world knowledge. Concept-based retrieval methods have attempted to tackle these difficulties by using manually built thesauri, by relying on term cooccurrence data, or by extracting latent word relationships and concepts from a corpus. In this article we introduce a new concept-based retrieval approach based on Explicit Semantic Analysis (ESA), a recently proposed method that augments keyword-based text representation with concept-based features, automatically extracted from massive human knowledge repositories such as Wikipedia. Our approach generates new text features automatically, and we have found that high-quality feature selection becomes crucial in this setting to make the retrieval more focused. However, due to the lack of labeled data, traditional feature selection methods cannot be used, hence we propose new methods that use self-generated labeled training data. The resulting system is evaluated on several TREC datasets, showing superior performance over previous state-of-the-art results.

Keywords:
Computer science Information retrieval Selection (genetic algorithm) Word (group theory) Representation (politics) Feature (linguistics) Natural language processing Artificial intelligence

Metrics

281
Cited By
34.86
FWCI (Field Weighted Citation Impact)
84
Refs
1.00
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Data Jacket Retrieval Based on Explicit Semantic Analysis

Quexuan Zhang

Year: 2015 Vol: 3 Pages: 749-752
BOOK-CHAPTER

Information Retrieval Using Latent Semantic Analysis

Rahul KhokaleNileshsingh V. ThakurMahendra S. MakesarNitin A. Koli

Smart innovation, systems and technologies Year: 2019 Pages: 393-404
JOURNAL ARTICLE

GOING BEYOND EXPLICIT KNOWLEDGE FOR IMPROVED SEMANTIC BASED INFORMATION RETRIEVAL

Rada Mihalcea

Journal:   International Journal of Artificial Intelligence Tools Year: 2002 Vol: 11 (04)Pages: 553-586
© 2026 ScienceGate Book Chapters — All rights reserved.