Optimal densification for fast and accurate minwise hashing

A Shrivastava - International Conference on Machine …, 2017 - proceedings.mlr.press
Minwise hashing is a fundamental and one of the most successful hashing algorithm in the
literature. Recent advances based on the idea of densification (Shrivastava\& Li, 2014) have …

Index structures for fast similarity search for binary vectors

DA Rachkovskij - Cybernetics and Systems Analysis, 2017 - Springer
This article reviews index structures for fast similarity search for objects represented by
binary vectors (with components equal to 0 or 1). Structures for both exact and approximate …

PUFFINN: parameterless and universally fast finding of nearest neighbors

M Aumüller, T Christiani, R Pagh, M Vesterli - arxiv preprint arxiv …, 2019 - arxiv.org
We present PUFFINN, a parameterless LSH-based index for solving the $ k $-nearest
neighbor problem with probabilistic guarantees. By parameterless we mean that the user is …

Bagminhash-minwise hashing algorithm for weighted sets

O Ertl - Proceedings of the 24th ACM SIGKDD International …, 2018 - dl.acm.org
Minwise hashing has become a standard tool to calculate signatures which allow direct
estimation of Jaccard similarities. While very efficient algorithms already exist for the …

Neural distributed autoassociative memories: A survey

VI Gritsenko, DA Rachkovskij, AA Frolov… - arxiv preprint arxiv …, 2017 - arxiv.org
Introduction. Neural network models of autoassociative, distributed memory allow storage
and retrieval of many items (vectors) where the number of stored items can exceed the …

Bidirectionally densifying lsh sketches with empty bins

P Jia, P Wang, J Zhao, S Zhang, Y Qi, M Hu… - Proceedings of the …, 2021 - dl.acm.org
As an efficient tool for approximate similarity computation and search, Locality Sensitive
Hashing (LSH) has been widely used in many research areas including databases, data …

Gb-kmv: An augmented kmv sketch for approximate containment similarity search

Y Yang, Y Zhang, W Zhang… - 2019 IEEE 35th …, 2019 - ieeexplore.ieee.org
In this paper, we study the problem of approximate containment similarity search. Given two
records Q and X, the containment similarity between Q and X with respect to Q is| Q intersect …

ProbMinHash–a class of locality-sensitive hash algorithms for the (probability) Jaccard similarity

O Ertl - IEEE Transactions on Knowledge and Data …, 2020 - ieeexplore.ieee.org
The probability Jaccard similarity was recently proposed as a natural generalization of the
Jaccard similarity to measure the proximity of sets whose elements are associated with …

Fast locality-sensitive hashing frameworks for approximate near neighbor search

T Christiani - International Conference on Similarity Search and …, 2019 - Springer
Abstract The Indyk-Motwani Locality-Sensitive Hashing (LSH) framework (STOC 1998) is a
general technique for constructing a data structure to answer approximate near neighbor …

Effective indexing for dynamic structural graph clustering

F Zhang, S Wang - Proceedings of the VLDB Endowment, 2022 - dl.acm.org
Graph clustering is a fundamental data mining task that clusters vertices into different
groups. The structural graph clustering algorithm (SCAN) is a widely used graph clustering …