Blocking and filtering techniques for entity resolution: A survey
Entity Resolution (ER), a core task of Data Integration, detects different entity profiles that
correspond to the same real-world object. Due to its inherently quadratic complexity, a series …
correspond to the same real-world object. Due to its inherently quadratic complexity, a series …
ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms
M Aumüller, E Bernhardsson, A Faithfull - Information Systems, 2020 - Elsevier
This paper describes ANN-Benchmarks, a tool for evaluating the performance of in-memory
approximate nearest neighbor algorithms. It provides a standard interface for measuring the …
approximate nearest neighbor algorithms. It provides a standard interface for measuring the …
A survey of blocking and filtering techniques for entity resolution
Efficiency techniques are an integral part of Entity Resolution, since its infancy. In this
survey, we organized the bulk of works in the field into Blocking, Filtering and hybrid …
survey, we organized the bulk of works in the field into Blocking, Filtering and hybrid …
MR-MVPP: A map-reduce-based approach for creating MVPP in data warehouses for big data applications
Materialized view selection (MVS) is the problem of selecting an appropriate set of views to
be materialized to speed up analytical query processing of data warehouses. Online …
be materialized to speed up analytical query processing of data warehouses. Online …
Pigeonring: A principle for faster thresholded similarity search
The pigeonhole principle states that if $ n $ items are contained in $ m $ boxes, then at least
one box has no more than $ n/m $ items. It is utilized to solve many data management …
one box has no more than $ n/m $ items. It is utilized to solve many data management …
Tokenjoin: efficient filtering for set similarity join with maximumweighted bipartite matching
Set similarity join is an important problem with many applications in data discovery, cleaning
and integration. To increase robustness, fuzzy set similarity join calculates the similarity of …
and integration. To increase robustness, fuzzy set similarity join calculates the similarity of …
Fast locality-sensitive hashing frameworks for approximate near neighbor search
T Christiani - International Conference on Similarity Search and …, 2019 - Springer
Abstract The Indyk-Motwani Locality-Sensitive Hashing (LSH) framework (STOC 1998) is a
general technique for constructing a data structure to answer approximate near neighbor …
general technique for constructing a data structure to answer approximate near neighbor …
Higher-order count sketch: Dimensionality reduction that retains efficient tensor operations
Sketching is a randomized dimensionality-reduction method that aims to preserve relevant
information in large-scale datasets. Count sketch is a simple popular sketch which uses a …
information in large-scale datasets. Count sketch is a simple popular sketch which uses a …
PPIS-JOIN: A novel privacy-preserving image similarity join method
Recently, massive multimedia data (especially images) is moved to the cloud environment
for analysis and retrieval, which makes data security issue become particularly significant …
for analysis and retrieval, which makes data security issue become particularly significant …
Metricjoin: Leveraging metric properties for robust exact set similarity joins
Given two collections of sets, the set similarity join reports all pairs of sets that are within a
given distance threshold. State-of-the-art solutions employ an inverted list index and several …
given distance threshold. State-of-the-art solutions employ an inverted list index and several …