[PDF][PDF] Data cleaning: Problems and current approaches

E Rahm, HH Do - IEEE Data Eng. Bull., 2000 - cs.brown.edu
We classify data quality problems that are addressed by data cleaning and provide an
overview of the main solution approaches. Data cleaning is especially required when …

Data fusion

J Bleiholder, F Naumann - ACM computing surveys (CSUR), 2009 - dl.acm.org
The development of the Internet in recent years has made it possible and useful to access
many different information systems anywhere in the world to obtain information. While there …

Benchmark and survey of automated machine learning frameworks

MA Zöller, MF Huber - Journal of artificial intelligence research, 2021 - jair.org
Abstract Machine learning (ML) has become a vital part in many aspects of our daily life.
However, building well performing machine learning applications requires highly …

Data Mining The Text Book

C Aggarwal - 2015 - Springer
This textbook explores the different aspects of data mining from the fundamentals to the
complex data types and their applications, capturing the wide diversity of problem domains …

Debugging inputs

L Kirschner, E Soremekun, A Zeller - Proceedings of the ACM/IEEE 42nd …, 2020 - dl.acm.org
When a program fails to process an input, it need not be the program code that is at fault. It
can also be that the input data is faulty, for instance as result of data corruption. To get the …

Wrangler: Interactive visual specification of data transformation scripts

S Kandel, A Paepcke, J Hellerstein, J Heer - Proceedings of the sigchi …, 2011 - dl.acm.org
Though data analysis tools continue to improve, analysts still expend an inordinate amount
of time and effort manipulating data and assessing data quality issues. Such" data …

[PDF][PDF] Potter's wheel: An interactive data cleaning system

V Raman, JM Hellerstein - VLDB, 2001 - vldb.org
Cleaning data of errors in structure and content is important for data warehousing and
integration. Current solutions for data cleaning involve many iterations of data “auditing” to …

Frameworks for entity matching: A comparison

H Köpcke, E Rahm - Data & Knowledge Engineering, 2010 - Elsevier
Entity matching is a crucial and difficult task for data integration. Entity matching frameworks
provide several methods and their combination to effectively solve different match tasks. In …

Research directions in data wrangling: Visualizations and transformations for usable and credible data

S Kandel, J Heer, C Plaisant, J Kennedy… - Information …, 2011 - journals.sagepub.com
In spite of advances in technologies for working with data, analysts still spend an inordinate
amount of time diagnosing data quality issues and manipulating data into a usable form …

Conceptual modeling for ETL processes

P Vassiliadis, A Simitsis, S Skiadopoulos - Proceedings of the 5th ACM …, 2002 - dl.acm.org
Extraction-Transformation-Loading (ETL) tools are pieces of software responsible for the
extraction of data from several sources, their cleansing, customization and insertion into a …