JOURNAL ARTICLE

Selective provenance for datalog programs using top-k queries

Daniel DeutchAmir GiladYuval Moskovitch

Year: 2015 Journal:   Proceedings of the VLDB Endowment Vol: 8 (12)Pages: 1394-1405   Publisher: Association for Computing Machinery

Abstract

Highly expressive declarative languages, such as datalog , are now commonly used to model the operational logic of data-intensive applications. The typical complexity of such datalog programs, and the large volume of data that they process, call for result explanation. Results may be explained through the tracking and presentation of data provenance , and here we focus on a detailed form of provenance ( how-provenance ), defining it as the set of derivation trees of a given fact. While informative, the size of such full provenance information is typically too large and complex (even when compactly represented) to allow displaying it to the user. To this end, we propose a novel top-k query language for querying datalog provenance, supporting selection criteria based on tree patterns and ranking based on the rules and database facts used in derivation. We propose an efficient novel algorithm based on (1) instrumenting the datalog program so that, upon evaluation, it generates only relevant provenance, and (2) efficient top-k (relevant) provenance generation, combined with bottom-up datalog evaluation. The algorithm computes in polynomial data complexity a compact representation of the top-k trees which may be explicitly constructed in linear time with respect to their size. We further experimentally study the algorithm performance, showing its scalability even for complex datalog programs where full provenance tracking is infeasible.

Keywords:
Datalog Computer science Provenance Scalability Theoretical computer science Tree (set theory) Set (abstract data type) Programming language Time complexity Database Algorithm Mathematics

Metrics

34
Cited By
11.00
FWCI (Field Weighted Citation Impact)
53
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Scientific Computing and Data Management
Social Sciences →  Decision Sciences →  Information Systems and Management
Advanced Database Systems and Queries
Physical Sciences →  Computer Science →  Computer Networks and Communications
Distributed and Parallel Computing Systems
Physical Sciences →  Computer Science →  Computer Networks and Communications

Related Documents

JOURNAL ARTICLE

Efficient provenance tracking for datalog using top-k queries

Daniel DeutchAmir GiladYuval Moskovitch

Journal:   The VLDB Journal Year: 2018 Vol: 27 (2)Pages: 245-269
JOURNAL ARTICLE

Provenance-guided synthesis of Datalog programs

Mukund RaghothamanJonathan MendelsonDavid ZhaoMayur NaikBernhard Scholz

Journal:   Proceedings of the ACM on Programming Languages Year: 2019 Vol: 4 (POPL)Pages: 1-27
JOURNAL ARTICLE

Below and Above Why-Provenance for Datalog Queries

Marco CalauttiEster LivshitsAndréas PierisMarkus Schneider

Journal:   Proceedings of the ACM on Management of Data Year: 2024 Vol: 2 (5)Pages: 1-21
JOURNAL ARTICLE

The Complexity of Why-Provenance for Datalog Queries

Marco CalauttiEster LivshitsAndréas PierisMarkus Schneider

Journal:   Proceedings of the ACM on Management of Data Year: 2024 Vol: 2 (2)Pages: 1-16
BOOK-CHAPTER

Selective-NRA Algorithms for Top-k Queries

Jing YuanGuangzhong SunYe TianGuoliang ChenZhi Liu

Lecture notes in computer science Year: 2009 Pages: 15-26
© 2026 ScienceGate Book Chapters — All rights reserved.