Abstract

The annotation of the results of database queries with prove-nance information has many applications. This paper stud-ies provenance for datalog queries. We start by consider-ing provenance representation by (positive) Boolean expres-sions, as pioneered in the theories of incomplete and prob-abilistic databases. We show that even for linear datalog programs the representation of provenance using Boolean expressions incurs a super-polynomial size blowup in data complexity. We address this with an approach that is novel in provenance studies, showing that we can construct in PTIME poly-size (data complexity) provenance represen-tations as Boolean circuits. Then we present optimization techniques that embed the construction of circuits into semi-naive datalog evaluation, and further reduce the size of the circuits. We also illustrate the usefulness of our approach in multiple application domains such as query evaluation in probabilistic databases, and in deletion propagation. Next, we study the possibility of extending the circuit approach to the more general framework of semiring annotations intro-duced in earlier work. We show that for a large and useful class of provenance semirings, we can construct in PTIME poly-size circuits that capture the provenance.

Keywords:
P Datalog Computer science Theoretical computer science Probabilistic logic Representation (politics) Construct (python library) Provenance Conjunctive query Boolean expression Class (philosophy) Semiring Boolean circuit Annotation Time complexity Database Algorithm Programming language Boolean function Mathematics Discrete mathematics Artificial intelligence

Metrics

50
Cited By
0.00
FWCI (Field Weighted Citation Impact)
39
Refs
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Scientific Computing and Data Management
Social Sciences →  Decision Sciences →  Information Systems and Management
Advanced Database Systems and Queries
Physical Sciences →  Computer Science →  Computer Networks and Communications
Data Quality and Management
Social Sciences →  Decision Sciences →  Management Science and Operations Research

Related Documents

JOURNAL ARTICLE

Provenance-guided synthesis of Datalog programs

Mukund RaghothamanJonathan MendelsonDavid ZhaoMayur NaikBernhard Scholz

Journal:   Proceedings of the ACM on Programming Languages Year: 2019 Vol: 4 (POPL)Pages: 1-27
JOURNAL ARTICLE

Below and Above Why-Provenance for Datalog Queries

Marco CalauttiEster LivshitsAndréas PierisMarkus Schneider

Journal:   Proceedings of the ACM on Management of Data Year: 2024 Vol: 2 (5)Pages: 1-21
JOURNAL ARTICLE

The Complexity of Why-Provenance for Datalog Queries

Marco CalauttiEster LivshitsAndréas PierisMarkus Schneider

Journal:   Proceedings of the ACM on Management of Data Year: 2024 Vol: 2 (2)Pages: 1-16
JOURNAL ARTICLE

Circuits and Formulas for Datalog over Semirings

Austen Z. FanParaschos KoutrisSudeepa Roy

Journal:   Proceedings of the ACM on Management of Data Year: 2025 Vol: 3 (2)Pages: 1-22
JOURNAL ARTICLE

Efficient provenance tracking for datalog using top-k queries

Daniel DeutchAmir GiladYuval Moskovitch

Journal:   The VLDB Journal Year: 2018 Vol: 27 (2)Pages: 245-269
© 2026 ScienceGate Book Chapters — All rights reserved.