[PDF][PDF] Data cleaning: Problems and current approaches

E Rahm, HH Do - IEEE Data Eng. Bull., 2000 - cs.brown.edu
We classify data quality problems that are addressed by data cleaning and provide an
overview of the main solution approaches. Data cleaning is especially required when …

Knowledge discovery and interactive data mining in bioinformatics-state-of-the-art, future challenges and research directions

A Holzinger, M Dehmer, I Jurisica - BMC bioinformatics, 2014 - Springer
Background The life sciences, biomedicine and health care are increasingly turning into a
data intensive science [2–4]. Particularly in bioinformatics and computational biology we …

Interactive deduplication using active learning

S Sarawagi, A Bhamidipaty - Proceedings of the eighth ACM SIGKDD …, 2002 - dl.acm.org
Deduplication is a key operation in integrating data from multiple sources. The main
challenge in this task is designing a function that can resolve when a pair of records refer to …

[PDF][PDF] Potter's wheel: An interactive data cleaning system

V Raman, JM Hellerstein - VLDB, 2001 - vldb.org
Cleaning data of errors in structure and content is important for data warehousing and
integration. Current solutions for data cleaning involve many iterations of data “auditing” to …

[CARTE][B] Quality-driven query answering for integrated information systems

F Naumann - 2002 - Springer
8 Completeness-Driven Query Optimization Page 1 8 Completeness-Driven Query
Optimization Completeness measures the usefulness of a source or of a plan to answer a …

Learning object identification rules for information integration

S Tejada, CA Knoblock, S Minton - Information Systems, 2001 - Elsevier
When integrating information from multiple websites, the same data objects can exist in
inconsistent text formats across sites, making it difficult to identify matching objects using …

Mining database structure; or, how to build a data quality browser

T Dasu, T Johnson, S Muthukrishnan… - Proceedings of the 2002 …, 2002 - dl.acm.org
Data mining research typically assumes that the data to be analyzed has been identified,
gathered, cleaned, and processed into a convenient form. While data mining tools greatly …

Representing data quality in sensor data streaming environments

A Klein, W Lehner - Journal of Data and Information Quality (JDIQ), 2009 - dl.acm.org
Sensors in smart-item environments capture data about product conditions and usage to
support business decisions as well as production automation processes. A challenging …

[PDF][PDF] Intelliclean: a knowledge-based intelligent data cleaner

ML Lee, TW Ling, WL Low - Proceedings of the sixth ACM SIGKDD …, 2000 - dl.acm.org
Existing data cleaning methods work on the basis of computing the degree of similarity
between nearby records in a sorted database. High recall is achieved by accepting records …

[PDF][PDF] 数据质量和数据清洗研究综述

郭志懋, 周傲英 - 软件学报, 2002 - Citeseer
对数据质量, 尤其是数据清洗的研究进行了综述. 首先说明数据质量的重要性和衡量指标,
定义了数据清洗问题. 然后对数据清洗问题进行分类, 并分析了解决这些问题的途径 …