An overview of end-to-end entity resolution for big data

V Christophides, V Efthymiou, T Palpanas… - ACM Computing …, 2020 - dl.acm.org
One of the most critical tasks for improving data quality and increasing the reliability of data
analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to …

Data cleaning: Overview and emerging challenges

X Chu, IF Ilyas, S Krishnan, J Wang - Proceedings of the 2016 …, 2016 - dl.acm.org
Detecting and repairing dirty data is one of the perennial challenges in data analytics, and
failure to do so can result in inaccurate analytics and unreliable decisions. Over the past few …

[LIBRO][B] Data cleaning

IF Ilyas, X Chu - 2019 - books.google.com
This is an overview of the end-to-end data cleaning process. Data quality is one of the most
important problems in data management, since dirty data often leads to inaccurate data …

Data and information quality

C Batini, M Scannapieco - Cham, Switzerland: Springer International …, 2016 - Springer
This book is the result of a study path that started in 2006, when the two authors of this book
published the book Data Quality: Concepts, Methodologies and Techniques. After 8 years …

Duplicate record detection: A survey

AK Elmagarmid, PG Ipeirotis… - IEEE Transactions on …, 2006 - ieeexplore.ieee.org
Often, in the real world, entities have two or more representations in databases. Duplicate
records do not share a common key and/or they contain errors that make duplicate matching …

Information extraction

S Sarawagi - Foundations and Trends® in Databases, 2008 - nowpublishers.com
The automatic extraction of information from unstructured sources has opened up new
avenues for querying, organizing, and analyzing data by drawing upon the clean semantics …

Link mining: a survey

L Getoor, CP Diehl - Acm Sigkdd Explorations Newsletter, 2005 - dl.acm.org
Many datasets of interest today are best described as a linked collection of interrelated
objects. These may represent homogeneous networks, in which there is a single-object type …

[LIBRO][B] Exploratory data mining and data cleaning

T Dasu, T Johnson - 2003 - books.google.com
Written for practitioners of data mining, data cleaning and database management. Presents
a technical treatment of data quality including process, metrics, tools and algorithms …

[LIBRO][B] Foundations of data quality management

W Fan, F Geerts - 2012 - books.google.com
Data quality is one of the most important problems in data management. A database system
typically aims to support the creation, maintenance and use of large amount of data …

Evaluation of entity resolution approaches on real-world match problems

H Köpcke, A Thor, E Rahm - Proceedings of the VLDB Endowment, 2010 - dl.acm.org
Despite the huge amount of recent research efforts on entity resolution (matching) there has
not yet been a comparative evaluation on the relative effectiveness and efficiency of …