Detecting data errors: Where are we and what needs to be done?
Data cleaning has played a critical role in ensuring data quality for enterprise applications.
Naturally, there has been extensive research in this area, and many data cleaning …
Naturally, there has been extensive research in this area, and many data cleaning …
[HTML][HTML] Steering data quality with visual analytics: The complexity challenge
Data quality management, especially data cleansing, has been extensively studied for many
years in the areas of data management and visual analytics. In the paper, we first review and …
years in the areas of data management and visual analytics. In the paper, we first review and …
Data profiling: A tutorial
is to understand the dataset at hand and its metadata. The process of metadata discovery is
known as data profiling. Profiling activities range from ad-hoc approaches, such as eye …
known as data profiling. Profiling activities range from ad-hoc approaches, such as eye …
Holodetect: Few-shot learning for error detection
We introduce a few-shot learning framework for error detection. We show that data
augmentation (a form of weak supervision) is key to training high-quality, ML-based error …
augmentation (a form of weak supervision) is key to training high-quality, ML-based error …
[BUKU][B] Data profiling
Data profiling refers to the activity of collecting data about data,{ie}, metadata. Most IT
professionals and researchers who work with data have engaged in data profiling, at least …
professionals and researchers who work with data have engaged in data profiling, at least …
Rotom: A meta-learned data augmentation framework for entity matching, data cleaning, text classification, and beyond
Deep Learning revolutionizes almost all fields of computer science including data
management. However, the demand for high-quality training data is slowing down deep …
management. However, the demand for high-quality training data is slowing down deep …
Robust discovery of positive and negative rules in knowledge bases
We present RUDIK, a system for the discovery of declarative rules over knowledge-bases
(KBs). RUDIK discovers rules that express positive relationships between entities, such as" if …
(KBs). RUDIK discovers rules that express positive relationships between entities, such as" if …
Sudowoodo: Contrastive self-supervised learning for multi-purpose data integration and preparation
Machine learning (ML) is playing an increasingly important role in data management tasks,
particularly in Data Integration and Preparation (DI&P). The success of ML-based …
particularly in Data Integration and Preparation (DI&P). The success of ML-based …
Slimfast: Guaranteed results for data fusion and source reliability
We focus on data fusion, ie, the problem of unifying conflicting data from data sources into a
single representation by estimating the source accuracies. We propose SLiMFast, a …
single representation by estimating the source accuracies. We propose SLiMFast, a …
Interactive cleaning for progressive visualization through composite questions
In this paper, we study the problem of interactive cleaning for progressive visualization
(ICPV): Given a bad visualization V, it is to obtain a" cleaned" visualization V whose distance …
(ICPV): Given a bad visualization V, it is to obtain a" cleaned" visualization V whose distance …