Profiling relational data: a survey

Z Abedjan, L Golab, F Naumann - The VLDB Journal, 2015 - Springer
Profiling data to determine metadata about a given dataset is an important and frequent
activity of any IT professional and researcher and is necessary for various use-cases. It …

Quantitative data cleaning for large databases

JM Hellerstein - 2013 - biblioteca.unisced.edu.mz
Data collection has become a ubiquitous function of large organizations {not only for record
kee**, but to support a variety of data analysis tasks that are critical to the organizational …

Data profiling revisited

F Naumann - ACM SIGMOD Record, 2014 - dl.acm.org
Data profiling comprises a broad range of methods to efficiently analyze a given data set. In
a typical scenario, which mirrors the capabilities of commercial data profiling tools, tables of …

[KNIHA][B] Data profiling

Z Abedjan, L Golab, F Naumann, T Papenbrock - 2019 - Springer
Data profiling refers to the activity of collecting data about data,{ie}, metadata. Most IT
professionals and researchers who work with data have engaged in data profiling, at least …

Efficient discovery of approximate dependencies

S Kruse, F Naumann - Proceedings of the VLDB Endowment, 2018 - dl.acm.org
Functional dependencies (FDs) and unique column combinations (UCCs) form a valuable
ingredient for many data management tasks, such as data cleaning, schema recovery, and …

Discovering data quality problems: the case of repurposed data

R Zhang, M Indulska, S Sadiq - Business & Information Systems …, 2019 - Springer
Existing methodologies for identifying data quality problems are typically user-centric, where
data quality requirements are first determined in a top-down manner following well …

Connecting databases with process mining: a meta model and toolset

E González López de Murillas, HA Reijers… - Software & Systems …, 2019 - Springer
Process mining techniques require event logs which, in many cases, are obtained from
databases. Obtaining these event logs is not a trivial task and requires substantial domain …

Event correlation for process discovery from web service interaction logs

HR Motahari-Nezhad, R Saint-Paul, F Casati… - The VLDB Journal, 2011 - Springer
Understanding, analyzing, and ultimately improving business processes is a goal of
enterprises today. These tasks are challenging as business processes in modern …

Query reverse engineering

QT Tran, CY Chan, S Parthasarathy - The VLDB Journal, 2014 - Springer
In this paper, we introduce a new problem termed query reverse engineering (QRE). Given a
database DD and a result table TT—the output of some known or unknown query QQ on DD …

Scalable discovery of unique column combinations

A Heise, JA Quiané-Ruiz, Z Abedjan… - Proceedings of the …, 2013 - dl.acm.org
The discovery of all unique (and non-unique) column combinations in a given dataset is at
the core of any data profiling effort. The results are useful for a large number of areas of data …