Data-centric artificial intelligence: A survey

D Zha, ZP Bhat, KH Lai, F Yang, Z Jiang… - ACM Computing …, 2025 - dl.acm.org
Artificial Intelligence (AI) is making a profound impact in almost every domain. A vital enabler
of its great success is the availability of abundant and high-quality data for building machine …

Data cleaning: Overview and emerging challenges

X Chu, IF Ilyas, S Krishnan, J Wang - Proceedings of the 2016 …, 2016 - dl.acm.org
Detecting and repairing dirty data is one of the perennial challenges in data analytics, and
failure to do so can result in inaccurate analytics and unreliable decisions. Over the past few …

[KIRJA][B] Data cleaning

IF Ilyas, X Chu - 2019 - books.google.com
This is an overview of the end-to-end data cleaning process. Data quality is one of the most
important problems in data management, since dirty data often leads to inaccurate data …

Discovering denial constraints

X Chu, IF Ilyas, P Papotti - Proceedings of the VLDB Endowment, 2013 - dl.acm.org
Integrity constraints (ICs) provide a valuable tool for enforcing correct application semantics.
However, designing ICs requires experts and time. Proposals for automatic discovery have …

Cleanml: A study for evaluating the impact of data cleaning on ml classification tasks

P Li, X Rao, J Blase, Y Zhang, X Chu… - 2021 IEEE 37th …, 2021 - ieeexplore.ieee.org
Data quality affects machine learning (ML) model performances, and data scientists spend
considerable amount of time on data cleaning before model training. However, to date, there …

Conditional functional dependencies for capturing data inconsistencies

W Fan, F Geerts, X Jia, A Kementsietsidis - ACM Transactions on …, 2008 - dl.acm.org
We propose a class of integrity constraints for relational databases, referred to as conditional
functional dependencies (CFDs), and study their applications in data cleaning. In contrast to …