An overview of end-to-end entity resolution for big data
One of the most critical tasks for improving data quality and increasing the reliability of data
analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to …
analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to …
Blocking and filtering techniques for entity resolution: A survey
Entity Resolution (ER), a core task of Data Integration, detects different entity profiles that
correspond to the same real-world object. Due to its inherently quadratic complexity, a series …
correspond to the same real-world object. Due to its inherently quadratic complexity, a series …
Can we beat the prefix filtering? An adaptive framework for similarity join and search
As two important operations in data cleaning, similarity join and similarity search have
attracted much attention recently. Existing methods to support similarity join usually adopt a …
attracted much attention recently. Existing methods to support similarity join usually adopt a …
String similarity search and join: a survey
String similarity search and join are two important operations in data cleaning and
integration, which extend traditional exact search and exact join operations in databases by …
integration, which extend traditional exact search and exact join operations in databases by …
String similarity joins: An experimental evaluation
String similarity join is an important operation in data integration and cleansing that finds
similar string pairs from two collections of strings. More than ten algorithms have been …
similar string pairs from two collections of strings. More than ten algorithms have been …
Pass-join: A partition-based method for similarity joins
As an essential operation in data cleaning, the similarity join has attracted considerable
attention from the database community. In this paper, we study string similarity joins with edit …
attention from the database community. In this paper, we study string similarity joins with edit …
Falcon: Scaling up hands-off crowdsourced entity matching to build cloud services
Many works have applied crowdsourcing to entity matching (EM). While promising, these
approaches are limited in that they often require a developer to be in the loop. As such, it is …
approaches are limited in that they often require a developer to be in the loop. As such, it is …
Fast-join: An efficient method for fuzzy token matching based string similarity join
String similarity join that finds similar string pairs between two string sets is an essential
operation in many applications, and has attracted significant attention recently in the …
operation in many applications, and has attracted significant attention recently in the …
Compression of uncertain trajectories in road networks
Massive volumes of uncertain trajectory data are being generated by GPS devices. Due to
the limitations of GPS data, these trajectories are generally uncertain. This state of affairs …
the limitations of GPS data, these trajectories are generally uncertain. This state of affairs …
Massjoin: A mapreduce-based method for scalable string similarity joins
String similarity join is an essential operation in data integration. The era of big data calls for
scalable algorithms to support large-scale string similarity joins. In this paper, we study …
scalable algorithms to support large-scale string similarity joins. In this paper, we study …