An overview of end-to-end entity resolution for big data

V Christophides, V Efthymiou, T Palpanas… - ACM Computing …, 2020 - dl.acm.org
One of the most critical tasks for improving data quality and increasing the reliability of data
analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to …

User identity linkage across online social networks: A review

K Shu, S Wang, J Tang, R Zafarani, H Liu - Acm Sigkdd Explorations …, 2017 - dl.acm.org
The increasing popularity and diversity of social media sites has encouraged more and
more people to participate on multiple online social networks to enjoy their services. Each …

Data and information quality

C Batini, M Scannapieco - Cham, Switzerland: Springer International …, 2016 - Springer
This book is the result of a study path that started in 2006, when the two authors of this book
published the book Data Quality: Concepts, Methodologies and Techniques. After 8 years …

[KNIHA][B] The data matching process

P Christen, P Christen - 2012 - Springer
This chapter provides an overview of the data matching process, and describes the five
major steps involved in this process: data pre-processing (cleaning and standardisation) …

A survey of indexing techniques for scalable record linkage and deduplication

P Christen - IEEE transactions on knowledge and data …, 2011 - ieeexplore.ieee.org
Record linkage is the process of matching records from several databases that refer to the
same entities. When applied on a single database, this process is known as deduplication …

[KNIHA][B] Ontology matching

J Euzenat, P Shvaiko - 2007 - Springer
An ontology typically provides a vocabulary describing a domain of interest and a
specification of the meaning of terms in that vocabulary. Depending on the precision of this …

Duplicate record detection: A survey

AK Elmagarmid, PG Ipeirotis… - IEEE Transactions on …, 2006 - ieeexplore.ieee.org
Often, in the real world, entities have two or more representations in databases. Duplicate
records do not share a common key and/or they contain errors that make duplicate matching …

Evaluation of entity resolution approaches on real-world match problems

H Köpcke, A Thor, E Rahm - Proceedings of the VLDB Endowment, 2010 - dl.acm.org
Despite the huge amount of recent research efforts on entity resolution (matching) there has
not yet been a comparative evaluation on the relative effectiveness and efficiency of …

Data-Centric Systems and Applications

MJ Carey, S Ceri, P Bernstein, U Dayal, C Faloutsos… - Italy: Springer, 2006 - Springer
The rapid growth of the Web in the past two decades has made it the largest publicly
accessible data source in the world. Web mining aims to discover useful information or …

NADEEF: a commodity data cleaning system

M Dallachiesa, A Ebaid, A Eldawy… - Proceedings of the …, 2013 - dl.acm.org
Despite the increasing importance of data quality and the rich theoretical and practical
contributions in all aspects of data cleaning, there is no single end-to-end off-the-shelf …