Data cleaning: Overview and emerging challenges

X Chu, IF Ilyas, S Krishnan, J Wang - Proceedings of the 2016 …, 2016 - dl.acm.org
Detecting and repairing dirty data is one of the perennial challenges in data analytics, and
failure to do so can result in inaccurate analytics and unreliable decisions. Over the past few …

A review on data cleansing methods for big data

F Ridzuan, WMNW Zainon - Procedia Computer Science, 2019 - Elsevier
Massive amounts of data are available for the organization which will influence their
business decision. Data collected from the various resources are dirty and this will affect the …

Benchmark and survey of automated machine learning frameworks

MA Zöller, MF Huber - Journal of artificial intelligence research, 2021 - jair.org
Abstract Machine learning (ML) has become a vital part in many aspects of our daily life.
However, building well performing machine learning applications requires highly …

Data validation for machine learning

N Polyzotis, M Zinkevich, S Roy… - … of machine learning …, 2019 - proceedings.mlsys.org
Abstract Machine learning is a powerful tool for gleaning knowledge from massive amounts
of data. While a great deal of machine learning research has focused on improving the …

[КНИГА][B] Data cleaning

IF Ilyas, X Chu - 2019 - books.google.com
This is an overview of the end-to-end data cleaning process. Data quality is one of the most
important problems in data management, since dirty data often leads to inaccurate data …

Data lifecycle challenges in production machine learning: a survey

N Polyzotis, S Roy, SE Whang, M Zinkevich - ACM SIGMOD Record, 2018 - dl.acm.org
Machine learning has become an essential tool for gleaning knowledge from data and
tackling a diverse set of computationally hard tasks. However, the accuracy of a machine …

[КНИГА][B] Magellan: Toward building entity matching management systems

PV Konda - 2018 - search.proquest.com
Entity matching (EM) identifies data instances that refer to the same real-world entity, such
as (David Smith, UWMadison) and (DM Smith, UWM). This problem has been a long …

Detecting data errors: Where are we and what needs to be done?

Z Abedjan, X Chu, D Deng, RC Fernandez… - Proceedings of the …, 2016 - dl.acm.org
Data cleaning has played a critical role in ensuring data quality for enterprise applications.
Naturally, there has been extensive research in this area, and many data cleaning …

C-store: a column-oriented DBMS

M Stonebraker, DJ Abadi, A Batkin, X Chen… - … Databases Work: the …, 2018 - dl.acm.org
This paper presents the design of a read-optimized relational DBMS that contrasts sharply
with most current systems, which are write-optimized. Among the many differences in its …

The end of an architectural era: It's time for a complete rewrite

M Stonebraker, S Madden, DJ Abadi… - … Databases Work: the …, 2018 - dl.acm.org
In previous papers [SC05, SBC+ 07], some of us predicted the end of" one size fits all" as a
commercial relational DBMS paradigm. These papers presented reasons and experimental …