A New Indexing Method for Approximate Search in Text Databases

Fei Shi; C. Mefford

doi:10.1109/cit.2005.23

ScienceGate Book Chapters

JOURNAL ARTICLE

A New Indexing Method for Approximate Search in Text Databases

Fei Shi C. Mefford

Year: 2005 Vol: 13 Pages: 70-76

DOI: 10.1109/cit.2005.23

Get Full-Text PDF Get Analytical Report

Abstract

We present an index structure to support the approximate keyword search in text databases. In an approximate keyword search query, the user presents a query word Q and a tolerance value k (k \geqslant 0), and wishes to find all documents in the database that contain the query word Q or any other word in the vocabulary that matches Q approximately (We say that two words match each other approximately if the edit distance between them does not exceed the tolerance value k. In a typical text database application, a user will choose k = 1, 2, 3, or 4). Our index structure is built on the underlying vocabulary of the text database. The new technique has two principal components a new data structure called the V-tree and its partition methods for clustering words in the vocabulary into subgroups. We have implemented our index structure and conducted experiments on real-world data. Our experiments show that even for very large text database, the construction of our index and a search for keywords that match the query word approximately can be done quickly. Our implemntation makes it clear that the V-tree data structure can be easily integrated into existing access structures built on the database such as the inverted index file.

Keywords:

Computer science Inverted index Search engine indexing Information retrieval Partition (number theory) Index (typography) Word (group theory) Vocabulary Database index Database View Cluster analysis Data structure Database design Artificial intelligence World Wide Web Mathematics

Metrics

Cited By

1.62

FWCI (Field Weighted Citation Impact)

Refs

0.84

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Data Management and Algorithms

Physical Sciences → Computer Science → Signal Processing

Algorithms and Data Compression

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Database Systems and Queries

Physical Sciences → Computer Science → Computer Networks and Communications

A New Indexing Method for Approximate Search in Text Databases

Abstract

Metrics

Citation History

Topics

Related Documents

Fast Approximate Search in Text Databases

Approximate indexing in road network databases

Hybrid Approximate Nearest Neighbor Indexing and Search (HANNIS) for Large Descriptor Databases

Indexing Text with Approximate q-Grams

Indexing text with approximate q-grams