The annotation of the results of database queries with prove-nance information has many applications. This paper stud-ies provenance for datalog queries. We start by consider-ing provenance representation by (positive) Boolean expres-sions, as pioneered in the theories of incomplete and prob-abilistic databases. We show that even for linear datalog programs the representation of provenance using Boolean expressions incurs a super-polynomial size blowup in data complexity. We address this with an approach that is novel in provenance studies, showing that we can construct in PTIME poly-size (data complexity) provenance represen-tations as Boolean circuits. Then we present optimization techniques that embed the construction of circuits into semi-naive datalog evaluation, and further reduce the size of the circuits. We also illustrate the usefulness of our approach in multiple application domains such as query evaluation in probabilistic databases, and in deletion propagation. Next, we study the possibility of extending the circuit approach to the more general framework of semiring annotations intro-duced in earlier work. We show that for a large and useful class of provenance semirings, we can construct in PTIME poly-size circuits that capture the provenance.
Mukund RaghothamanJonathan MendelsonDavid ZhaoMayur NaikBernhard Scholz
Marco CalauttiEster LivshitsAndréas PierisMarkus Schneider
Marco CalauttiEster LivshitsAndréas PierisMarkus Schneider
Austen Z. FanParaschos KoutrisSudeepa Roy
Daniel DeutchAmir GiladYuval Moskovitch