Smoke: Fine-grained lineage at interactive speed

F Psallidas, E Wu - arxiv preprint arxiv:1801.07237, 2018 - arxiv.org
Data lineage describes the relationship between individual input and output data items of a
workflow, and has served as an integral ingredient for both traditional (eg, debugging …

Capturing and querying fine-grained provenance of preprocessing pipelines in data science

A Chapman, P Missier, G Simonelli… - Proceedings of the VLDB …, 2020 - dl.acm.org
Data processing pipelines that are designed to clean, transform and alter data in preparation
for learning predictive models, have an impact on those models' accuracy and performance …

Data provenance

B Glavic - Foundations and Trends® in Databases, 2021 - nowpublishers.com
Data provenance has evolved from a niche topic to a mainstream area of research in
databases and other research communities. This article gives a comprehensive introduction …

Bao: Learning to steer query optimizers

R Marcus, P Negi, H Mao, N Tatbul, M Alizadeh… - arxiv preprint arxiv …, 2020 - arxiv.org
Query optimization remains one of the most challenging problems in data management
systems. Recent efforts to apply machine learning techniques to query optimization …

Supporting Better Insights of Data Science Pipelines with Fine-grained Provenance

A Chapman, L Lauro, P Missier, R Torlone - ACM Transactions on …, 2024 - dl.acm.org
Successful data-driven science requires complex data engineering pipelines to clean,
transform, and alter data in preparation for machine learning, and robust results can only be …

GProM-a swiss army knife for your provenance needs

BS Arab, S Feng, B Glavic, S Lee, X Niu… - A Quarterly bulletin of the …, 2018 - par.nsf.gov
We present an overview of GProM, a generic provenance middleware for relational
databases. The sys-tem supports diverse provenance and annotation management tasks …

PUG: a framework and practical implementation for why and why-not provenance

S Lee, B Ludäscher, B Glavic - The VLDB Journal, 2019 - Springer
Explaining why an answer is (or is not) returned by a query is important for many
applications including auditing, debugging data and queries, and answering hypothetical …

In-memory blockchain: Toward efficient and trustworthy data provenance for hpc systems

A Al-Mamun, T Li, M Sadoghi… - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
The state-of-the-art approaches for tracking data provenance on high-performance
computing (HPC) systems are either supported by file systems or relational databases …

You Say'What', I Hear'Where'and'Why':(Mis-) Interpreting SQL to Derive Fine-Grained Provenance

T Müller, B Dietrich, T Grust - arxiv preprint arxiv:1805.11517, 2018 - arxiv.org
SQL declaratively specifies what the desired output of a query is. This work shows that a non-
standard interpretation of the SQL semantics can, instead, disclose where a piece of the …

Toward accurate and efficient emulation of public blockchains in the cloud

X Wang, A Al-Mamun, F Yan, D Zhao - … Conference, Held as Part of the …, 2019 - Springer
Blockchain is an enabler of many emerging decentralized applications in areas of
cryptocurrency, Internet of Things, smart healthcare, among many others. Although various …