An overview of end-to-end entity resolution for big data

V Christophides, V Efthymiou, T Palpanas… - ACM Computing …, 2020 - dl.acm.org
One of the most critical tasks for improving data quality and increasing the reliability of data
analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to …

Blocking and filtering techniques for entity resolution: A survey

G Papadakis, D Skoutas, E Thanos… - ACM Computing Surveys …, 2020 - dl.acm.org
Entity Resolution (ER), a core task of Data Integration, detects different entity profiles that
correspond to the same real-world object. Due to its inherently quadratic complexity, a series …

[ΒΙΒΛΙΟ][B] The data matching process

P Christen, P Christen - 2012 - Springer
This chapter provides an overview of the data matching process, and describes the five
major steps involved in this process: data pre-processing (cleaning and standardisation) …

A survey of indexing techniques for scalable record linkage and deduplication

P Christen - IEEE transactions on knowledge and data …, 2011 - ieeexplore.ieee.org
Record linkage is the process of matching records from several databases that refer to the
same entities. When applied on a single database, this process is known as deduplication …

Duplicate record detection: A survey

AK Elmagarmid, PG Ipeirotis… - IEEE Transactions on …, 2006 - ieeexplore.ieee.org
Often, in the real world, entities have two or more representations in databases. Duplicate
records do not share a common key and/or they contain errors that make duplicate matching …

Robust and fast similarity search for moving object trajectories

L Chen, MT Özsu, V Oria - Proceedings of the 2005 ACM SIGMOD …, 2005 - dl.acm.org
An important consideration in similarity-based retrieval of moving object trajectories is the
definition of a distance function. The existing distance functions are usually sensitive to …

Data-Centric Systems and Applications

MJ Carey, S Ceri, P Bernstein, U Dayal, C Faloutsos… - Italy: Springer, 2006 - Springer
The rapid growth of the Web in the past two decades has made it the largest publicly
accessible data source in the world. Web mining aims to discover useful information or …

[ΒΙΒΛΙΟ][B] Exploratory data mining and data cleaning

T Dasu, T Johnson - 2003 - books.google.com
Written for practitioners of data mining, data cleaning and database management. Presents
a technical treatment of data quality including process, metrics, tools and algorithms …

[ΒΙΒΛΙΟ][B] Similarity search: the metric space approach

P Zezula, G Amato, V Dohnal, M Batko - 2006 - books.google.com
The area of similarity searching is a very hot topic for both research and c-mercial
applications. Current data processing applications use data with c-siderably less structure …

A survey on blocking technology of entity resolution

BH Li, Y Liu, AM Zhang, WH Wang, S Wan - Journal of Computer Science …, 2020 - Springer
Entity resolution (ER) is a significant task in data integration, which aims to detect all entity
profiles that correspond to the same real-world entity. Due to its inherently quadratic …