A survey on locality sensitive hashing algorithms and their applications
Finding nearest neighbors in high-dimensional spaces is a fundamental operation in many
diverse application domains. Locality Sensitive Hashing (LSH) is one of the most popular …
diverse application domains. Locality Sensitive Hashing (LSH) is one of the most popular …
[HTML][HTML] Improving malicious URLs detection via feature engineering: Linear and nonlinear space transformation methods
In malicious URLs detection, traditional classifiers are challenged because the data volume
is huge, patterns are changing over time, and the correlations among features are …
is huge, patterns are changing over time, and the correlations among features are …
A review for weighted minhash algorithms
Data similarity (or distance) computation is a fundamental research topic which underpins
many high-level applications based on similarity measures in machine learning and data …
many high-level applications based on similarity measures in machine learning and data …
Refining codes for locality sensitive hashing
Learning to hash is of particular interest in information retrieval for large-scale data due to its
high efficiency and effectiveness. Most studies in hashing concentrate on constructing new …
high efficiency and effectiveness. Most studies in hashing concentrate on constructing new …
Serving deep learning models with deduplication from relational databases
There are significant benefits to serve deep learning models from relational databases. First,
features extracted from databases do not need to be transferred to any decoupled deep …
features extracted from databases do not need to be transferred to any decoupled deep …
A fast LSH-based similarity search method for multivariate time series
Due to advances in mobile devices and sensors, there has been an increasing interest in
the analysis of multivariate time series. Identifying similar time series is a core subroutine in …
the analysis of multivariate time series. Identifying similar time series is a core subroutine in …
PM-LSH: a fast and accurate in-memory framework for high-dimensional approximate NN and closest pair search
Nearest neighbor (NN) search is inherently computationally expensive in high-dimensional
spaces due to the curse of dimensionality. As a well-known solution, locality-sensitive …
spaces due to the curse of dimensionality. As a well-known solution, locality-sensitive …
An effective and scalable framework for authorship attribution query processing
Authorship attribution aims at identifying the original author of an anonymous text from a
given set of candidate authors and has a wide range of applications. The main challenge in …
given set of candidate authors and has a wide range of applications. The main challenge in …
Improved consistent weighted sampling revisited
Min-Hash is a popular technique for efficiently estimating the Jaccard similarity of binary
sets. Consistent Weighted Sampling (CWS) generalizes the Min-Hash scheme to sketch …
sets. Consistent Weighted Sampling (CWS) generalizes the Min-Hash scheme to sketch …
A Survey on Efficient Processing of Similarity Queries over Neural Embeddings
Y Wang - arxiv preprint arxiv:2204.07922, 2022 - arxiv.org
Similarity query is the family of queries based on some similarity metrics. Unlike the
traditional database queries which are mostly based on value equality, similarity queries aim …
traditional database queries which are mostly based on value equality, similarity queries aim …