Detecting data errors: Where are we and what needs to be done?

Z Abedjan, X Chu, D Deng, RC Fernandez… - Proceedings of the …, 2016 - dl.acm.org
Data cleaning has played a critical role in ensuring data quality for enterprise applications.
Naturally, there has been extensive research in this area, and many data cleaning …

[HTML][HTML] Steering data quality with visual analytics: The complexity challenge

S Liu, G Andrienko, Y Wu, N Cao, L Jiang, C Shi… - Visual Informatics, 2018 - Elsevier
Data quality management, especially data cleansing, has been extensively studied for many
years in the areas of data management and visual analytics. In the paper, we first review and …

Data profiling: A tutorial

Z Abedjan, L Golab, F Naumann - Proceedings of the 2017 ACM …, 2017 - dl.acm.org
is to understand the dataset at hand and its metadata. The process of metadata discovery is
known as data profiling. Profiling activities range from ad-hoc approaches, such as eye …

Holodetect: Few-shot learning for error detection

A Heidari, J McGrath, IF Ilyas… - Proceedings of the 2019 …, 2019 - dl.acm.org
We introduce a few-shot learning framework for error detection. We show that data
augmentation (a form of weak supervision) is key to training high-quality, ML-based error …

[BUKU][B] Data profiling

Z Abedjan, L Golab, F Naumann, T Papenbrock - 2019 - Springer
Data profiling refers to the activity of collecting data about data,{ie}, metadata. Most IT
professionals and researchers who work with data have engaged in data profiling, at least …

Rotom: A meta-learned data augmentation framework for entity matching, data cleaning, text classification, and beyond

Z Miao, Y Li, X Wang - … of the 2021 International Conference on …, 2021 - dl.acm.org
Deep Learning revolutionizes almost all fields of computer science including data
management. However, the demand for high-quality training data is slowing down deep …

Robust discovery of positive and negative rules in knowledge bases

S Ortona, VV Meduri, P Papotti - 2018 IEEE 34th International …, 2018 - ieeexplore.ieee.org
We present RUDIK, a system for the discovery of declarative rules over knowledge-bases
(KBs). RUDIK discovers rules that express positive relationships between entities, such as" if …

Sudowoodo: Contrastive self-supervised learning for multi-purpose data integration and preparation

R Wang, Y Li, J Wang - 2023 IEEE 39th International …, 2023 - ieeexplore.ieee.org
Machine learning (ML) is playing an increasingly important role in data management tasks,
particularly in Data Integration and Preparation (DI&P). The success of ML-based …

Slimfast: Guaranteed results for data fusion and source reliability

T Rekatsinas, M Joglekar, H Garcia-Molina… - Proceedings of the …, 2017 - dl.acm.org
We focus on data fusion, ie, the problem of unifying conflicting data from data sources into a
single representation by estimating the source accuracies. We propose SLiMFast, a …

Interactive cleaning for progressive visualization through composite questions

Y Luo, C Chai, X Qin, N Tang… - 2020 IEEE 36th …, 2020 - ieeexplore.ieee.org
In this paper, we study the problem of interactive cleaning for progressive visualization
(ICPV): Given a bad visualization V, it is to obtain a" cleaned" visualization V whose distance …