A survey on provenance: What for? What form? What from?

M Herschel, R Diestelkämper, H Ben Lahmar - The VLDB Journal, 2017 - Springer
Provenance refers to any information describing the production process of an end product,
which can be anything from a piece of digital data to a physical object. While this survey …

Managing messes in computational notebooks

A Head, F Hohman, T Barik, SM Drucker… - Proceedings of the 2019 …, 2019 - dl.acm.org
Data analysts use computational notebooks to write code for analyzing and visualizing data.
Notebooks help analysts iteratively write analysis code by letting them interleave code with …

A survey on collecting, managing, and analyzing provenance from scripts

JF Pimentel, J Freire, L Murta… - ACM Computing Surveys …, 2019 - dl.acm.org
Scripts are widely used to design and run scientific experiments. Scripting languages are
easy to learn and use, and they allow complex tasks to be specified and executed in fewer …

Capturing and querying fine-grained provenance of preprocessing pipelines in data science

A Chapman, P Missier, G Simonelli… - Proceedings of the VLDB …, 2020 - dl.acm.org
Data processing pipelines that are designed to clean, transform and alter data in preparation
for learning predictive models, have an impact on those models' accuracy and performance …

[PDF][PDF] noWorkflow: a tool for collecting, analyzing, and managing provenance from python scripts

JF Pimentel, L Murta, V Braganholo… - Proceedings of the VLDB …, 2017 - par.nsf.gov
We present noWorkflow, an open-source tool that systematically and transparently collects
provenance from Python scripts, including data about the script execution and how the script …

Supporting better insights of data science pipelines with fine-grained provenance

A Chapman, L Lauro, P Missier, R Torlone - ACM Transactions on …, 2024 - dl.acm.org
Successful data-driven science requires complex data engineering pipelines to clean,
transform, and alter data in preparation for machine learning, and robust results can only be …

Yin & Yang: demonstrating complementary provenance from noWorkflow & YesWorkflow

JF Pimentel, S Dey, T McPhillips, K Belhajjame… - … VA, USA, June 7-8, 2016 …, 2016 - Springer
The noWorkflow and YesWorkflow toolkits both enable researchers to capture, store, query,
and visualize the provenance of results produced by scripts that process scientific data …

Tracking and analyzing the evolution of provenance from scripts

JF Pimentel, J Freire, V Braganholo, L Murta - … McLean, VA, USA, June 7-8 …, 2016 - Springer
Script languages are powerful tools for scientists. Scientists use them to process data,
invoke programs, and link program outputs/inputs. During the life cycle of scientific …

Versioned-PROV: A PROV extension to support mutable data entities

JFN Pimentel, P Missier, L Murta… - … and Annotation Workshop, 2018 - Springer
The PROV data model assumes that entities are immutable and all changes to an entity e
are represented by the creation of a new entity e'. This is reasonable for many provenance …

[HTML][HTML] Using introspection to collect provenance in R

B Lerner, E Boose, L Perez - Informatics, 2018 - mdpi.com
Data provenance is the history of an item of data from the point of its creation to its present
state. It can support science by improving understanding of and confidence in data …