An overview of end-to-end entity resolution for big data

V Christophides, V Efthymiou, T Palpanas… - ACM Computing …, 2020 - dl.acm.org
One of the most critical tasks for improving data quality and increasing the reliability of data
analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to …

Blocking and filtering techniques for entity resolution: A survey

G Papadakis, D Skoutas, E Thanos… - ACM Computing Surveys …, 2020 - dl.acm.org
Entity Resolution (ER), a core task of Data Integration, detects different entity profiles that
correspond to the same real-world object. Due to its inherently quadratic complexity, a series …

Sourcerercc: Scaling code clone detection to big-code

H Sajnani, V Saini, J Svajlenko, CK Roy… - Proceedings of the 38th …, 2016 - dl.acm.org
Despite a decade of active research, there has been a marked lack in clone detection
techniques that scale to large repositories for detecting near-miss clones. In this paper, we …

Constructing an interactive natural language interface for relational databases

F Li, HV Jagadish - Proceedings of the VLDB Endowment, 2014 - dl.acm.org
Natural language has been the holy grail of query interface designers, but has generally
been considered too hard to work with, except in limited specific circumstances. In this …

Efficient k-nearest neighbor graph construction for generic similarity measures

W Dong, C Moses, K Li - … of the 20th international conference on World …, 2011 - dl.acm.org
K-Nearest Neighbor Graph (K-NNG) construction is an important operation with many web
related applications, including collaborative filtering, similarity search, and many others in …

Crowder: Crowdsourcing entity resolution

J Wang, T Kraska, MJ Franklin, J Feng - arxiv preprint arxiv:1208.1927, 2012 - arxiv.org
Entity resolution is central to data integration and data cleaning. Algorithmic approaches
have been improving in quality, but remain far from perfect. Crowdsourcing platforms offer a …

Datatone: Managing ambiguity in natural language interfaces for data visualization

T Gao, M Dontcheva, E Adar, Z Liu… - Proceedings of the 28th …, 2015 - dl.acm.org
Answering questions with data is a difficult and time-consuming process. Visual dashboards
and templates make it easy to get started, but asking more sophisticated questions often …

Josie: Overlap set similarity search for finding joinable tables in data lakes

E Zhu, D Deng, F Nargesian, RJ Miller - Proceedings of the 2019 …, 2019 - dl.acm.org
We present a new solution for finding joinable tables in massive data lakes: given a table
and one join column, find tables that can be joined with the given table on the largest …

Practical non-interactive searchable encryption with forward and backward privacy

SF Sun, R Steinfeld, S Lai, X Yuan… - Usenix Network and …, 2021 - research.monash.edu
Abstract In Dynamic Symmetric Searchable Encryption (DSSE), forward privacy ensures that
previous search queries cannot be associated with future updates, while backward privacy …

Evaluation of entity resolution approaches on real-world match problems

H Köpcke, A Thor, E Rahm - Proceedings of the VLDB Endowment, 2010 - dl.acm.org
Despite the huge amount of recent research efforts on entity resolution (matching) there has
not yet been a comparative evaluation on the relative effectiveness and efficiency of …