An index structure for similarity join based on high-frequency queries

Kamolwan Kunanusont; Jaruloj Chongstitvatana

doi:10.1109/icsec.2014.6978233

ScienceGate Book Chapters

JOURNAL ARTICLE

An index structure for similarity join based on high-frequency queries

Kamolwan Kunanusont Jaruloj Chongstitvatana

Year: 2014 Pages: 415-420

DOI: 10.1109/icsec.2014.6978233

Get Full-Text PDF Get Analytical Report

Abstract

Strings databases are widely used in many applications these days. Searching for texts which are similar to query texts is necessary. Similarity join finds pairs of texts whose similarity exceeds a given threshold. Many researches have been done to reduce the time for similarity join. The filter-and-verify framework is one approach which first filters out dissimilar pairs of text and then verifies the remaining pairs. Prefix filtering is a filter-and-verify method which eliminates dissimilar pairs of texts by comparing only prefixes of the texts. However, these algorithms for similarity join disregard the frequencies of queries. Based on the data collected from Google trends explorer, some queries appear with higher frequency. This paper aims to reduce the running time for similarity join by focusing on these high-frequency queries. Based on these high-frequency queries, indices are created to facilitate these queries and any queries which are similar to them. The proposed indices and similarity join algorithm are implemented to evaluate its performance. Experiments show that the proposed method outperforms a leading similarity join algorithm - AdaptSearch - when queries are similar to a high-frequency query.

Keywords:

Join (topology) Similarity (geometry) Computer science Filter (signal processing) Prefix Information retrieval Index (typography) Data mining Theoretical computer science Algorithm Database Mathematics Artificial intelligence World Wide Web Combinatorics

Metrics

Cited By

0.75

FWCI (Field Weighted Citation Impact)

Refs

0.80

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Data Quality and Management

Social Sciences → Decision Sciences → Management Science and Operations Research

Data Management and Algorithms

Physical Sciences → Computer Science → Signal Processing

Web Data Mining and Analysis

Physical Sciences → Computer Science → Information Systems

An index structure for similarity join based on high-frequency queries

Abstract

Metrics

Citation History

Topics

Related Documents

Refining high-frequency-queries-based filter for similarity join

FINDING SETS OF HIGH-FREQUENCY QUERIES FOR HIGH-FREQUENCY-QUERY-BASED FILTER FOR SIMILARITY JOIN

Finding a set of high-frequency queries for high-frequency-query-based filter for similarity join

Cluster Analysis to Find Sets of High-frequency Queries for Filtering in Similarity Join

Recommending Join Queries Based on Path Frequency