JOURNAL ARTICLE

An index structure for similarity join based on high-frequency queries

Abstract

Strings databases are widely used in many applications these days. Searching for texts which are similar to query texts is necessary. Similarity join finds pairs of texts whose similarity exceeds a given threshold. Many researches have been done to reduce the time for similarity join. The filter-and-verify framework is one approach which first filters out dissimilar pairs of text and then verifies the remaining pairs. Prefix filtering is a filter-and-verify method which eliminates dissimilar pairs of texts by comparing only prefixes of the texts. However, these algorithms for similarity join disregard the frequencies of queries. Based on the data collected from Google trends explorer, some queries appear with higher frequency. This paper aims to reduce the running time for similarity join by focusing on these high-frequency queries. Based on these high-frequency queries, indices are created to facilitate these queries and any queries which are similar to them. The proposed indices and similarity join algorithm are implemented to evaluate its performance. Experiments show that the proposed method outperforms a leading similarity join algorithm - AdaptSearch - when queries are similar to a high-frequency query.

Keywords:
Join (topology) Similarity (geometry) Computer science Filter (signal processing) Prefix Information retrieval Index (typography) Data mining Theoretical computer science Algorithm Database Mathematics Artificial intelligence World Wide Web Combinatorics

Metrics

3
Cited By
0.75
FWCI (Field Weighted Citation Impact)
10
Refs
0.80
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Data Quality and Management
Social Sciences →  Decision Sciences →  Management Science and Operations Research
Data Management and Algorithms
Physical Sciences →  Computer Science →  Signal Processing
Web Data Mining and Analysis
Physical Sciences →  Computer Science →  Information Systems
© 2026 ScienceGate Book Chapters — All rights reserved.