Blocking and filtering techniques for entity resolution: A survey
Entity Resolution (ER), a core task of Data Integration, detects different entity profiles that
correspond to the same real-world object. Due to its inherently quadratic complexity, a series …
correspond to the same real-world object. Due to its inherently quadratic complexity, a series …
String similarity search and join: a survey
String similarity search and join are two important operations in data cleaning and
integration, which extend traditional exact search and exact join operations in databases by …
integration, which extend traditional exact search and exact join operations in databases by …
A survey of indexing techniques for scalable record linkage and deduplication
P Christen - IEEE transactions on knowledge and data …, 2011 - ieeexplore.ieee.org
Record linkage is the process of matching records from several databases that refer to the
same entities. When applied on a single database, this process is known as deduplication …
same entities. When applied on a single database, this process is known as deduplication …
Efficient similarity joins for near-duplicate detection
With the increasing amount of data and the need to integrate data from multiple data
sources, one of the challenging issues is to identify near-duplicate records efficiently. In this …
sources, one of the challenging issues is to identify near-duplicate records efficiently. In this …
Efficient parallel set-similarity joins using mapreduce
In this paper we study how to efficiently perform set-similarity joins in parallel using the
popular MapReduce framework. We propose a 3-stage approach for end-to-end set …
popular MapReduce framework. We propose a 3-stage approach for end-to-end set …
Frameworks for entity matching: A comparison
Entity matching is a crucial and difficult task for data integration. Entity matching frameworks
provide several methods and their combination to effectively solve different match tasks. In …
provide several methods and their combination to effectively solve different match tasks. In …
Modern privacy-preserving record linkage techniques: An overview
Record linkage is the challenging task of deciding which records, coming from disparate
data sources, refer to the same entity. Established back in 1946 by Halbert L. Dunn, the area …
data sources, refer to the same entity. Established back in 1946 by Halbert L. Dunn, the area …
Fast-join: An efficient method for fuzzy token matching based string similarity join
String similarity join that finds similar string pairs between two string sets is an essential
operation in many applications, and has attracted significant attention recently in the …
operation in many applications, and has attracted significant attention recently in the …
Falcon: Scaling up hands-off crowdsourced entity matching to build cloud services
Many works have applied crowdsourcing to entity matching (EM). While promising, these
approaches are limited in that they often require a developer to be in the loop. As such, it is …
approaches are limited in that they often require a developer to be in the loop. As such, it is …
Three-dimensional entity resolution with JedAI
Entity Resolution (ER) is the task of detecting different entity profiles that describe the same
real-world objects. To facilitate its execution, we have developed JedAI, an open-source …
real-world objects. To facilitate its execution, we have developed JedAI, an open-source …