The Complexity of Why-Provenance for Datalog Queries

Marco Calautti; Ester Livshits; Andréas Pieris; Markus Schneider

doi:10.1145/3651146

ScienceGate Book Chapters

JOURNAL ARTICLE

The Complexity of Why-Provenance for Datalog Queries

Marco Calautti Ester Livshits Andréas Pieris Markus Schneider

Year: 2024 Journal: Proceedings of the ACM on Management of Data Vol: 2 (2)Pages: 1-16 Publisher: Association for Computing Machinery

DOI: 10.1145/3651146

Get Full-Text PDF Get Analytical Report

Abstract

Datalog is a powerful rule-based language that allows us to express complex recursive queries and has found numerous applications over the years. Explaining why a result to a Datalog query is obtained is an essential task towards explainable and transparent data-intensive applications that rely on Datalog. A standard way of explaining a query result is the so-called why-provenance, which provides information about the witnesses to a query result in the form of subsets of the input database that as a whole can be used to derive that result. To our surprise, despite the fact that the notion of why-provenance for Datalog queries has been around for decades and intensively studied, its computational complexity remains unexplored. Our goal is to fill this gap in the why-provenance literature. Towards this end, we pinpoint the data complexity of why-provenance for Datalog queries and key subclasses thereof. The takeaway of our work is that why-provenance for recursive queries, even if the recursion is limited to be linear, is an intractable problem, whereas for non-recursive queries is highly tractable.

Keywords:

Datalog Provenance Computer science Programming language Geology Petrology

Metrics

Cited By

4.68

FWCI (Field Weighted Citation Impact)

Refs

0.91

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Scientific Computing and Data Management

Social Sciences → Decision Sciences → Information Systems and Management

Advanced Database Systems and Queries

Physical Sciences → Computer Science → Computer Networks and Communications

Data Quality and Management

Social Sciences → Decision Sciences → Management Science and Operations Research

The Complexity of Why-Provenance for Datalog Queries

Abstract

Metrics

Citation History

Topics

Related Documents

Below and Above Why-Provenance for Datalog Queries

Computing the Why-Provenance for Datalog Queries via SAT Solvers

Precise complexity analysis for efficient datalog queries

Efficient provenance tracking for datalog using top-k queries

Selective provenance for datalog programs using top-k queries