[BUKU][B] The data matching process

P Christen, P Christen - 2012 - Springer
This chapter provides an overview of the data matching process, and describes the five
major steps involved in this process: data pre-processing (cleaning and standardisation) …

Data-Centric Systems and Applications

MJ Carey, S Ceri, P Bernstein, U Dayal, C Faloutsos… - Italy: Springer, 2006 - Springer
The rapid growth of the Web in the past two decades has made it the largest publicly
accessible data source in the world. Web mining aims to discover useful information or …

[BUKU][B] Foundations of data quality management

W Fan, F Geerts - 2012 - books.google.com
Data quality is one of the most important problems in data management. A database system
typically aims to support the creation, maintenance and use of large amount of data …

Trends in cleaning relational data: Consistency and deduplication

IF Ilyas, X Chu - Foundations and Trends® in Databases, 2015 - nowpublishers.com
Data quality is one of the most important problems in data management, since dirty data
often leads to inaccurate data analytics results and wrong business decisions. Poor data …

Pay-as-you-go entity resolution

SE Whang, D Marmaros… - IEEE Transactions on …, 2012 - ieeexplore.ieee.org
Entity resolution (ER) is the problem of identifying which records in a database refer to the
same entity. In practice, many applications need to resolve large data sets efficiently, but do …

Incremental record linkage

A Gruenheid, XL Dong, D Srivastava - Proceedings of the VLDB …, 2014 - dl.acm.org
Record linkage clusters records such that each cluster corresponds to a single distinct real-
world entity. It is a crucial step in data cleaning and data integration. In the big data era, the …

[PDF][PDF] 大数据的-个重要方面 数据可用性

**建中, 刘显敏 - 计算机研究与发展, 2013 - cs.sjtu.edu.cn
摘要!"# $% &'()*+,-.# $/0 123 4567893:;% &'<=>?@ ABCDEF GFHI# $8 J'KLMN
OPQRSTU@'VWIABXYZ [\],@ AB'KLVW^ _I!" AB'aZbc deABQ!^ fS ABXYZghiKjk l# $8 J …

Rule-based method for entity resolution

L Li, J Li, H Gao - IEEE Transactions on Knowledge and Data …, 2014 - ieeexplore.ieee.org
The objective of entity resolution (ER) is to identify records referring to the same real-world
entity. Traditional ER approaches identify records based on pairwise similarity comparisons …

Query-driven approach to entity resolution

H Altwaijry, DV Kalashnikov, S Mehrotra - Proceedings of the VLDB …, 2013 - dl.acm.org
This paper explores" on-the-fly" data cleaning in the context of a user query. A novel Query-
Driven Approach (QDA) is developed that performs a minimal number of cleaning steps that …

Incremental entity resolution on rules and data

SE Whang, H Garcia-Molina - The VLDB journal, 2014 - Springer
Entity resolution (ER) identifies database records that refer to the same real-world entity. In
practice, ER is not a one-time process, but is constantly improved as the data, schema and …