Answering top-k representative queries on graph databases

Sayan Ranu; Minh Hoang; Ambuj K. Singh

doi:10.1145/2588555.2610524

ScienceGate Book Chapters

JOURNAL ARTICLE

Answering top-k representative queries on graph databases

Sayan Ranu Minh Hoang Ambuj K. Singh

Year: 2014 Pages: 1163-1174

DOI: 10.1145/2588555.2610524

Get Full-Text PDF Get Analytical Report

Abstract

Given a function that classifies a data object as relevant or irrelevant, we consider the task of selecting k objects that best represent all relevant objects in the underlying database. This problem occurs naturally when analysts want to familiarize themselves with the relevant objects in a database using a small set of k exemplars. In this paper, we solve the problem of top-k representative queries on graph databases. While graph databases model a wide range of scientific data, solving the problem in the context of graphs presents us with unique challenges due to the inherent complexity of matching structures. Furthermore, top-k representative queries map to the classic Set Cover problem, making it NP-hard. To overcome these challenges, we develop a greedy approximation with theoretical guarantees on the quality of the answer set, noting that a better approximation is not feasible in polynomial time. To further optimize the quadratic computational cost of the greedy algorithm, we propose an index structure called NB-Index to index the \theta-neighborhoods of the database graphs by employing a novel combination of Lipschitz embedding and agglomerative clustering. Extensive experiments on real graph datasets validate the efficiency and effectiveness of the proposed techniques that achieve up to two orders of magnitude speed-up over state-of-the-art algorithms.

Keywords:

Computer science Graph database Embedding Cluster analysis Theoretical computer science Greedy algorithm Graph Approximation algorithm Database Data mining Algorithm Artificial intelligence

Metrics

Cited By

3.86

FWCI (Field Weighted Citation Impact)

Refs

0.95

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Graph Theory and Algorithms

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Data Management and Algorithms

Physical Sciences → Computer Science → Signal Processing

Complexity and Algorithms in Graphs

Physical Sciences → Computer Science → Computational Theory and Mathematics

Answering top-k representative queries on graph databases

Abstract

Metrics

Citation History

Topics

Related Documents

Answering Top-$k$ Graph Similarity Queries in Graph Databases

Answering Top-k Keyword Queries on Relational Databases

Top-k Differential Queries in Graph Databases

Preference-Based Top-k Representative Skyline Queries on Uncertain Databases

Answering top-k queries using views