Big data provenance: Challenges, state of the art and opportunities

J Wang, D Crawl, S Purawat… - … conference on big …, 2015 - ieeexplore.ieee.org
Ability to track provenance is a key feature of scientific workflows to support data lineage and
reproducibility. The challenges that are introduced by the volume, variety and velocity of Big …

A systematic review of provenance systems

B Pérez, J Rubio, C Sáenz-Adán - Knowledge and Information Systems, 2018 - Springer
Provenance refers to the entire amount of information, comprising all the elements and their
relationships, that contribute to the existence of a piece of data. The knowledge of …

A survey on provenance: What for? What form? What from?

M Herschel, R Diestelkämper, H Ben Lahmar - The VLDB Journal, 2017 - Springer
Provenance refers to any information describing the production process of an end product,
which can be anything from a piece of digital data to a physical object. While this survey …

Production machine learning pipelines: Empirical analysis and optimization opportunities

D **n, H Miao, A Parameswaran… - Proceedings of the 2021 …, 2021 - dl.acm.org
Machine learning (ML) is now commonplace, powering data-driven applications in various
organizations. Unlike the traditional perception of ML in research, ML production pipelines …

Titian: Data provenance support in spark

M Interlandi, K Shah, SD Tetali… - Proceedings of the …, 2015 - pmc.ncbi.nlm.nih.gov
Debugging data processing logic in Data-Intensive Scalable Computing (DISC) systems is a
difficult and time consuming effort. Today's DISC systems offer very little tooling for …

Data provenance

B Glavic - Foundations and Trends® in Databases, 2021 - nowpublishers.com
Data provenance has evolved from a niche topic to a mainstream area of research in
databases and other research communities. This article gives a comprehensive introduction …

The semiring framework for database provenance

TJ Green, V Tannen - Proceedings of the 36th ACM SIGMOD-SIGACT …, 2017 - dl.acm.org
Imagine a computational process that uses a complex input consisting of multiple" items"(eg,
files, tables, tuples, parameters, configuration rules) The provenance analysis of such a …

Improving reproducibility of data science pipelines through transparent provenance capture

L Rupprecht, JC Davis, C Arnold, Y Gur… - Proceedings of the …, 2020 - dl.acm.org
Data science has become prevalent in a large variety of domains. Inherent in its practice is
an exploratory, probing, and fact finding journey, which consists of the assembly, adaptation …

Data x-ray: A diagnostic tool for data errors

X Wang, XL Dong, A Meliou - Proceedings of the 2015 ACM SIGMOD …, 2015 - dl.acm.org
A lot of systems and applications are data-driven, and the correctness of their operation
relies heavily on the correctness of their data. While existing data cleaning techniques can …

IoT data provenance implementation challenges

A Alkhalil, RA Ramadan - Procedia Computer Science, 2017 - Elsevier
Internet of Things (IoT) has become an emerging trend in the information and engineering
industries, which is inevitable to impact our lives in various ways. The advancement of IoT as …