Blocking and filtering techniques for entity resolution: A survey

G Papadakis, D Skoutas, E Thanos… - ACM Computing Surveys …, 2020 - dl.acm.org
Entity Resolution (ER), a core task of Data Integration, detects different entity profiles that
correspond to the same real-world object. Due to its inherently quadratic complexity, a series …

String similarity search and join: a survey

M Yu, G Li, D Deng, J Feng - Frontiers of Computer Science, 2016 - Springer
String similarity search and join are two important operations in data cleaning and
integration, which extend traditional exact search and exact join operations in databases by …

Can we beat the prefix filtering? An adaptive framework for similarity join and search

J Wang, G Li, J Feng - Proceedings of the 2012 ACM SIGMOD …, 2012 - dl.acm.org
As two important operations in data cleaning, similarity join and similarity search have
attracted much attention recently. Existing methods to support similarity join usually adopt a …

String similarity joins: An experimental evaluation

Y Jiang, G Li, J Feng, WS Li - Proceedings of the VLDB Endowment, 2014 - dl.acm.org
String similarity join is an important operation in data integration and cleansing that finds
similar string pairs from two collections of strings. More than ten algorithms have been …

Massjoin: A mapreduce-based method for scalable string similarity joins

D Deng, G Li, S Hao, J Wang… - 2014 IEEE 30th …, 2014 - ieeexplore.ieee.org
String similarity join is an essential operation in data integration. The era of big data calls for
scalable algorithms to support large-scale string similarity joins. In this paper, we study …

Efficient approximate entity matching using jaro-winkler distance

Y Wang, J Qin, W Wang - International conference on web information …, 2017 - Springer
Jaro-Winkler distance is a measurement to measure the similarity between two strings.
Since Jaro-Winkler distance performs well in matching personal and entity names, it is …

Indexing metric spaces for exact similarity search

L Chen, Y Gao, X Song, Z Li, Y Zhu, X Miao… - ACM Computing …, 2022 - dl.acm.org
With the continued digitization of societal processes, we are seeing an explosion in
available data. This is referred to as big data. In a research setting, three aspects of the data …

String similarity measures and joins with synonyms

J Lu, C Lin, W Wang, C Li, H Wang - Proceedings of the 2013 ACM …, 2013 - dl.acm.org
A string similarity measure quantifies the similarity between two text strings for approximate
string matching or comparison. For example, the strings" Sam" and" Samuel" can be …

Embedjoin: Efficient edit similarity joins via embeddings

H Zhang, Q Zhang - Proceedings of the 23rd ACM SIGKDD international …, 2017 - dl.acm.org
We study the problem of edit similarity joins, where given a set of strings and a threshold
value K, we want to output all pairs of strings whose edit distances are at most K. Edit …

Efficient graph similarity joins with edit distance constraints

X Zhao, C **ao, X Lin, W Wang - 2012 IEEE 28th international …, 2012 - ieeexplore.ieee.org
Graphs are widely used to model complicated data semantics in many applications in
bioinformatics, chemistry, social networks, pattern recognition, etc. A recent trend is to …