An overview of end-to-end entity resolution for big data

V Christophides, V Efthymiou, T Palpanas… - ACM Computing …, 2020 - dl.acm.org
One of the most critical tasks for improving data quality and increasing the reliability of data
analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to …

Blocking and filtering techniques for entity resolution: A survey

G Papadakis, D Skoutas, E Thanos… - ACM Computing Surveys …, 2020 - dl.acm.org
Entity Resolution (ER), a core task of Data Integration, detects different entity profiles that
correspond to the same real-world object. Due to its inherently quadratic complexity, a series …

Can we beat the prefix filtering? An adaptive framework for similarity join and search

J Wang, G Li, J Feng - Proceedings of the 2012 ACM SIGMOD …, 2012 - dl.acm.org
As two important operations in data cleaning, similarity join and similarity search have
attracted much attention recently. Existing methods to support similarity join usually adopt a …

String similarity search and join: a survey

M Yu, G Li, D Deng, J Feng - Frontiers of Computer Science, 2016 - Springer
String similarity search and join are two important operations in data cleaning and
integration, which extend traditional exact search and exact join operations in databases by …

String similarity joins: An experimental evaluation

Y Jiang, G Li, J Feng, WS Li - Proceedings of the VLDB Endowment, 2014 - dl.acm.org
String similarity join is an important operation in data integration and cleansing that finds
similar string pairs from two collections of strings. More than ten algorithms have been …

Pass-join: A partition-based method for similarity joins

G Li, D Deng, J Wang, J Feng - arxiv preprint arxiv:1111.7171, 2011 - arxiv.org
As an essential operation in data cleaning, the similarity join has attracted considerable
attention from the database community. In this paper, we study string similarity joins with edit …

Falcon: Scaling up hands-off crowdsourced entity matching to build cloud services

S Das, PS GC, AH Doan, JF Naughton… - Proceedings of the …, 2017 - dl.acm.org
Many works have applied crowdsourcing to entity matching (EM). While promising, these
approaches are limited in that they often require a developer to be in the loop. As such, it is …

Fast-join: An efficient method for fuzzy token matching based string similarity join

J Wang, G Li, J Fe - 2011 IEEE 27th International Conference …, 2011 - ieeexplore.ieee.org
String similarity join that finds similar string pairs between two string sets is an essential
operation in many applications, and has attracted significant attention recently in the …

Compression of uncertain trajectories in road networks

T Li, R Huang, L Chen, CS Jensen… - Proceedings of the VLDB …, 2020 - vbn.aau.dk
Massive volumes of uncertain trajectory data are being generated by GPS devices. Due to
the limitations of GPS data, these trajectories are generally uncertain. This state of affairs …

Massjoin: A mapreduce-based method for scalable string similarity joins

D Deng, G Li, S Hao, J Wang… - 2014 IEEE 30th …, 2014 - ieeexplore.ieee.org
String similarity join is an essential operation in data integration. The era of big data calls for
scalable algorithms to support large-scale string similarity joins. In this paper, we study …