JOURNAL ARTICLE

The Complexity of Why-Provenance for Datalog Queries

Marco CalauttiEster LivshitsAndréas PierisMarkus Schneider

Year: 2024 Journal:   Proceedings of the ACM on Management of Data Vol: 2 (2)Pages: 1-16   Publisher: Association for Computing Machinery

Abstract

Datalog is a powerful rule-based language that allows us to express complex recursive queries and has found numerous applications over the years. Explaining why a result to a Datalog query is obtained is an essential task towards explainable and transparent data-intensive applications that rely on Datalog. A standard way of explaining a query result is the so-called why-provenance, which provides information about the witnesses to a query result in the form of subsets of the input database that as a whole can be used to derive that result. To our surprise, despite the fact that the notion of why-provenance for Datalog queries has been around for decades and intensively studied, its computational complexity remains unexplored. Our goal is to fill this gap in the why-provenance literature. Towards this end, we pinpoint the data complexity of why-provenance for Datalog queries and key subclasses thereof. The takeaway of our work is that why-provenance for recursive queries, even if the recursion is limited to be linear, is an intractable problem, whereas for non-recursive queries is highly tractable.

Keywords:
Datalog Provenance Computer science Programming language Geology Petrology

Metrics

2
Cited By
4.68
FWCI (Field Weighted Citation Impact)
14
Refs
0.91
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Scientific Computing and Data Management
Social Sciences →  Decision Sciences →  Information Systems and Management
Advanced Database Systems and Queries
Physical Sciences →  Computer Science →  Computer Networks and Communications
Data Quality and Management
Social Sciences →  Decision Sciences →  Management Science and Operations Research

Related Documents

JOURNAL ARTICLE

Below and Above Why-Provenance for Datalog Queries

Marco CalauttiEster LivshitsAndréas PierisMarkus Schneider

Journal:   Proceedings of the ACM on Management of Data Year: 2024 Vol: 2 (5)Pages: 1-21
JOURNAL ARTICLE

Computing the Why-Provenance for Datalog Queries via SAT Solvers

Marco CalauttiEster LivshitsAndréas PierisMarkus Schneider

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2024 Vol: 38 (9)Pages: 10459-10466
JOURNAL ARTICLE

Efficient provenance tracking for datalog using top-k queries

Daniel DeutchAmir GiladYuval Moskovitch

Journal:   The VLDB Journal Year: 2018 Vol: 27 (2)Pages: 245-269
JOURNAL ARTICLE

Selective provenance for datalog programs using top-k queries

Daniel DeutchAmir GiladYuval Moskovitch

Journal:   Proceedings of the VLDB Endowment Year: 2015 Vol: 8 (12)Pages: 1394-1405
© 2026 ScienceGate Book Chapters — All rights reserved.