We present an index structure to support the approximate keyword search in text databases. In an approximate keyword search query, the user presents a query word Q and a tolerance value k (k \geqslant 0), and wishes to find all documents in the database that contain the query word Q or any other word in the vocabulary that matches Q approximately (We say that two words match each other approximately if the edit distance between them does not exceed the tolerance value k. In a typical text database application, a user will choose k = 1, 2, 3, or 4). Our index structure is built on the underlying vocabulary of the text database. The new technique has two principal components a new data structure called the V-tree and its partition methods for clustering words in the vocabulary into subgroups. We have implemented our index structure and conducted experiments on real-world data. Our experiments show that even for very large text database, the construction of our index and a search for keywords that match the query word approximately can be done quickly. Our implemntation makes it clear that the V-tree data structure can be easily integrated into existing access structures built on the database such as the inverted index file.
Sang‐Chul LeeSang‐Wook KimJunghoon LeeJae Soo Yoo
Muhammad Mahbubur RahmanJelena Tešić
Gonzalo NavarroErkki SutinenJani TanninenJorma Tarhio
Gonzalo NavarroErkki SutinenJorma Tarhio