[PDF][PDF] Data cleaning: Problems and current approaches
E Rahm, HH Do - IEEE Data Eng. Bull., 2000 - cs.brown.edu
We classify data quality problems that are addressed by data cleaning and provide an
overview of the main solution approaches. Data cleaning is especially required when …
overview of the main solution approaches. Data cleaning is especially required when …
Knowledge discovery and interactive data mining in bioinformatics-state-of-the-art, future challenges and research directions
Background The life sciences, biomedicine and health care are increasingly turning into a
data intensive science [2–4]. Particularly in bioinformatics and computational biology we …
data intensive science [2–4]. Particularly in bioinformatics and computational biology we …
Interactive deduplication using active learning
Deduplication is a key operation in integrating data from multiple sources. The main
challenge in this task is designing a function that can resolve when a pair of records refer to …
challenge in this task is designing a function that can resolve when a pair of records refer to …
[PDF][PDF] Potter's wheel: An interactive data cleaning system
V Raman, JM Hellerstein - VLDB, 2001 - vldb.org
Cleaning data of errors in structure and content is important for data warehousing and
integration. Current solutions for data cleaning involve many iterations of data “auditing” to …
integration. Current solutions for data cleaning involve many iterations of data “auditing” to …
[CARTE][B] Quality-driven query answering for integrated information systems
F Naumann - 2002 - Springer
8 Completeness-Driven Query Optimization Page 1 8 Completeness-Driven Query
Optimization Completeness measures the usefulness of a source or of a plan to answer a …
Optimization Completeness measures the usefulness of a source or of a plan to answer a …
Learning object identification rules for information integration
When integrating information from multiple websites, the same data objects can exist in
inconsistent text formats across sites, making it difficult to identify matching objects using …
inconsistent text formats across sites, making it difficult to identify matching objects using …
Mining database structure; or, how to build a data quality browser
T Dasu, T Johnson, S Muthukrishnan… - Proceedings of the 2002 …, 2002 - dl.acm.org
Data mining research typically assumes that the data to be analyzed has been identified,
gathered, cleaned, and processed into a convenient form. While data mining tools greatly …
gathered, cleaned, and processed into a convenient form. While data mining tools greatly …
Representing data quality in sensor data streaming environments
A Klein, W Lehner - Journal of Data and Information Quality (JDIQ), 2009 - dl.acm.org
Sensors in smart-item environments capture data about product conditions and usage to
support business decisions as well as production automation processes. A challenging …
support business decisions as well as production automation processes. A challenging …
[PDF][PDF] Intelliclean: a knowledge-based intelligent data cleaner
Existing data cleaning methods work on the basis of computing the degree of similarity
between nearby records in a sorted database. High recall is achieved by accepting records …
between nearby records in a sorted database. High recall is achieved by accepting records …
[PDF][PDF] 数据质量和数据清洗研究综述
郭志懋, 周傲英 - 软件学报, 2002 - Citeseer
对数据质量, 尤其是数据清洗的研究进行了综述. 首先说明数据质量的重要性和衡量指标,
定义了数据清洗问题. 然后对数据清洗问题进行分类, 并分析了解决这些问题的途径 …
定义了数据清洗问题. 然后对数据清洗问题进行分类, 并分析了解决这些问题的途径 …