JOURNAL ARTICLE

Finding a set of high-frequency queries for high-frequency-query-based filter for similarity join

Abstract

Similarity search and similarity join are two important operations in text databases. Filter-and-verify framework aims to reduce the comparison time by filtering out some pairs of texts before actually comparing the remaining pairs. Many filter methods do not take into account the repetition of the query words over time. A query which is frequently repeated over a time period is called a high-frequency query. High-frequency-queries-based filter is a filter method that deals with this type of queries. The performance of this method depends on the choice of high-frequency queries. This paper proposes methods to find the set of high-frequency queries from the given query set. One method is to use DBSCAN and the other is to use DBSCAN with merging strategy, called DBSM. The experimental results show that both DBSCAN and DBSM can find high-frequency queries, but the set of high-frequency queries obtained from DBSM gives higher the pruning power for high-frequency-queries-based filter.

Keywords:
Computer science Filter (signal processing) Set (abstract data type) Similarity (geometry) Query optimization Information retrieval Pruning Result set Data mining Artificial intelligence

Metrics

1
Cited By
0.27
FWCI (Field Weighted Citation Impact)
7
Refs
0.54
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Data Management and Algorithms
Physical Sciences →  Computer Science →  Signal Processing
Data Mining Algorithms and Applications
Physical Sciences →  Computer Science →  Information Systems
Data Quality and Management
Social Sciences →  Decision Sciences →  Management Science and Operations Research
© 2026 ScienceGate Book Chapters — All rights reserved.