xstream: Outlier detection in feature-evolving data streams

E Manzoor, H Lamba, L Akoglu - Proceedings of the 24th ACM SIGKDD …, 2018 - dl.acm.org
This work addresses the outlier detection problem for feature-evolving streams, which has
not been studied before. In this setting both (1) data points may evolve, with feature values …

Optimal densification for fast and accurate minwise hashing

A Shrivastava - International Conference on Machine …, 2017 - proceedings.mlr.press
Minwise hashing is a fundamental and one of the most successful hashing algorithm in the
literature. Recent advances based on the idea of densification (Shrivastava\& Li, 2014) have …

Hashing, load balancing and multiple choice

U Wieder - … and Trends® in Theoretical Computer Science, 2017 - nowpublishers.com
Many tasks in computer systems could be abstracted as distributing items into buckets, so
that the allocation of items across buckets is as balanced as possible, and furthermore …

Fast similarity sketching

S Dahlgaard, MBT Knudsen… - 2017 IEEE 58th Annual …, 2017 - ieeexplore.ieee.org
We consider the Similarity Sketching problem: Given a universe [u]={0,..., u-1} we want a
random function S map** subsets A of [u] into vectors S (A) of size t, such that similarity is …

Practical hash functions for similarity estimation and dimensionality reduction

S Dahlgaard, M Knudsen… - Advances in Neural …, 2017 - proceedings.neurips.cc
Hashing is a basic tool for dimensionality reduction employed in several aspects of machine
learning. However, the perfomance analysis is often carried out under the abstract …

Fully understanding the hashing trick

CB Freksen, L Kamma… - Advances in Neural …, 2018 - proceedings.neurips.cc
Feature hashing, also known as {\em the hashing trick}, introduced by Weinberger et
al.(2009), is one of the key techniques used in scaling-up machine learning algorithms …

Binary vectors for fast distance and similarity estimation

DA Rachkovskij - Cybernetics and Systems Analysis, 2017 - Springer
This review considers methods and algorithms for fast estimation of distance/similarity
measures between initial data from vector representations with binary or integer-valued …

Invertible bloom lookup tables with less memory and randomness

N Fleischhacker, KG Larsen… - … on Algorithms (ESA …, 2024 - drops.dagstuhl.de
In this work we study Invertible Bloom Lookup Tables (IBLTs) with small failure probabilities.
IBLTs are highly versatile data structures that have found applications in set reconciliation …

Fast and powerful hashing using tabulation

M Thorup - Communications of the ACM, 2017 - dl.acm.org
Randomized algorithms are often enjoyed for their simplicity, but the hash functions
employed to yield the desired probabilistic guarantees are often too complicated to be …

Load balancing with dynamic set of balls and bins

A Aamand, JBT Knudsen, M Thorup - … of the 53rd Annual ACM SIGACT …, 2021 - dl.acm.org
In dynamic load balancing, we wish to distribute balls into bins in an environment where
both balls and bins can be added and removed. We want to minimize the maximum load of …