Profiling relational data: a survey

Z Abedjan, L Golab, F Naumann - The VLDB Journal, 2015 - Springer
Profiling data to determine metadata about a given dataset is an important and frequent
activity of any IT professional and researcher and is necessary for various use-cases. It …

Data profiling: A tutorial

Z Abedjan, L Golab, F Naumann - Proceedings of the 2017 ACM …, 2017 - dl.acm.org
is to understand the dataset at hand and its metadata. The process of metadata discovery is
known as data profiling. Profiling activities range from ad-hoc approaches, such as eye …

Data quality: The other face of big data

B Saha, D Srivastava - 2014 IEEE 30th international conference …, 2014 - ieeexplore.ieee.org
In our Big Data era, data is being generated, collected and analyzed at an unprecedented
scale, and data-driven decision making is swee** through all aspects of society. Recent …

Conditional functional dependencies for capturing data inconsistencies

W Fan, F Geerts, X Jia, A Kementsietsidis - ACM Transactions on …, 2008 - dl.acm.org
We propose a class of integrity constraints for relational databases, referred to as conditional
functional dependencies (CFDs), and study their applications in data cleaning. In contrast to …

Discovering conditional functional dependencies

W Fan, F Geerts, J Li, M **ong - IEEE Transactions on …, 2010 - ieeexplore.ieee.org
This paper investigates the discovery of conditional functional dependencies (CFDs). CFDs
are a recent extension of functional dependencies (FDs) by supporting patterns of …

Data profiling revisited

F Naumann - ACM SIGMOD Record, 2014 - dl.acm.org
Data profiling comprises a broad range of methods to efficiently analyze a given data set. In
a typical scenario, which mirrors the capabilities of commercial data profiling tools, tables of …

[BOK][B] Data profiling

Z Abedjan, L Golab, F Naumann, T Papenbrock - 2019 - Springer
Data profiling refers to the activity of collecting data about data,{ie}, metadata. Most IT
professionals and researchers who work with data have engaged in data profiling, at least …

Guided data repair

M Yakout, AK Elmagarmid, J Neville, M Ouzzani… - arxiv preprint arxiv …, 2011 - arxiv.org
In this paper we present GDR, a Guided Data Repair framework that incorporates user
feedback in the cleaning process to enhance and accelerate existing automatic repair …

Discovering data quality rules

F Chiang, RJ Miller - Proceedings of the VLDB Endowment, 2008 - dl.acm.org
Dirty data is a serious problem for businesses leading to incorrect decision making,
inefficient daily operations, and ultimately wasting both time and money. Dirty data often …

Dependencies revisited for improving data quality

W Fan - Proceedings of the twenty-seventh ACM SIGMOD …, 2008 - dl.acm.org
Dependency theory is almost as old as relational databases themselves, and has
traditionally been used to improve the quality of schema, among other things. Recently there …