JOURNAL ARTICLE

A New Indexing Method for Approximate Search in Text Databases

Abstract

We present an index structure to support the approximate keyword search in text databases. In an approximate keyword search query, the user presents a query word Q and a tolerance value k (k \geqslant 0), and wishes to find all documents in the database that contain the query word Q or any other word in the vocabulary that matches Q approximately (We say that two words match each other approximately if the edit distance between them does not exceed the tolerance value k. In a typical text database application, a user will choose k = 1, 2, 3, or 4). Our index structure is built on the underlying vocabulary of the text database. The new technique has two principal components a new data structure called the V-tree and its partition methods for clustering words in the vocabulary into subgroups. We have implemented our index structure and conducted experiments on real-world data. Our experiments show that even for very large text database, the construction of our index and a search for keywords that match the query word approximately can be done quickly. Our implemntation makes it clear that the V-tree data structure can be easily integrated into existing access structures built on the database such as the inverted index file.

Keywords:
Computer science Inverted index Search engine indexing Information retrieval Partition (number theory) Index (typography) Word (group theory) Vocabulary Database index Database View Cluster analysis Data structure Database design Artificial intelligence World Wide Web Mathematics

Metrics

8
Cited By
1.62
FWCI (Field Weighted Citation Impact)
14
Refs
0.84
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Data Management and Algorithms
Physical Sciences →  Computer Science →  Signal Processing
Algorithms and Data Compression
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Database Systems and Queries
Physical Sciences →  Computer Science →  Computer Networks and Communications

Related Documents

BOOK-CHAPTER

Fast Approximate Search in Text Databases

Fei Shi

Lecture notes in computer science Year: 2004 Pages: 259-267
JOURNAL ARTICLE

Hybrid Approximate Nearest Neighbor Indexing and Search (HANNIS) for Large Descriptor Databases

Muhammad Mahbubur RahmanJelena Tešić

Journal:   2022 IEEE International Conference on Big Data (Big Data) Year: 2022 Pages: 3895-3902
BOOK-CHAPTER

Indexing Text with Approximate q-Grams

Gonzalo NavarroErkki SutinenJani TanninenJorma Tarhio

Lecture notes in computer science Year: 2000 Pages: 350-363
JOURNAL ARTICLE

Indexing text with approximate q-grams

Gonzalo NavarroErkki SutinenJorma Tarhio

Journal:   Journal of Discrete Algorithms Year: 2004 Vol: 3 (2-4)Pages: 157-175
© 2026 ScienceGate Book Chapters — All rights reserved.