Differentially private hierarchical clustering with provable approximation guarantees
Hierarchical Clustering is a popular unsupervised machine learning method with decades of
history and numerous applications. We initiate the study of differentially-private …
history and numerous applications. We initiate the study of differentially-private …
Improving dual-encoder training through dynamic indexes for negative mining
Dual encoder models are ubiquitous in modern classification and retrieval. Crucial for
training such dual encoders is an accurate estimation of gradients from the partition function …
training such dual encoders is an accurate estimation of gradients from the partition function …
Terahac: Hierarchical agglomerative clustering of trillion-edge graphs
We introduce TeraHAC, a (1+ ε)-approximate hierarchical agglomerative clustering (HAC)
algorithm which scales to trillion-edge graphs. Our algorithm is based on a new approach to …
algorithm which scales to trillion-edge graphs. Our algorithm is based on a new approach to …
Discovering personalized characteristic communities in attributed graphs
What is the widest community in which a person exercises a strong impact? Although
extensive attention has been devoted to searching communities containing given …
extensive attention has been devoted to searching communities containing given …
It's Hard to HAC with Average Linkage!
Average linkage Hierarchical Agglomerative Clustering (HAC) is an extensively studied and
applied method for hierarchical clustering. Recent applications to massive datasets have …
applied method for hierarchical clustering. Recent applications to massive datasets have …
Sub-quadratic (1+\eps)-approximate Euclidean Spanners, with Applications
We study graph spanners for point-set in the high-dimensional Euclidean space. On the one
hand, we prove that spanners with stretch<\sqrt {2} and subquadratic size are not possible …
hand, we prove that spanners with stretch<\sqrt {2} and subquadratic size are not possible …
Sub-quadratic (1+ ϵ)-approximate Euclidean Spanners, with Applications
We study graph spanners for point-set in the high-dimensional Euclidean space. On the one
hand, we prove that spanners with stretch \lt2 and subquadratic size are not possible, even if …
hand, we prove that spanners with stretch \lt2 and subquadratic size are not possible, even if …
The ParClusterers Benchmark Suite (PCBS): A Fine-Grained Analysis of Scalable Graph Clustering
We introduce the ParClusterers Benchmark Suite (PCBS)--a collection of highly scalable
parallel graph clustering algorithms and benchmarking tools that streamline comparing …
parallel graph clustering algorithms and benchmarking tools that streamline comparing …
DynHAC: Fully Dynamic Approximate Hierarchical Agglomerative Clustering
We consider the problem of maintaining a hierarchical agglomerative clustering (HAC) in the
dynamic setting, when the input is subject to point insertions and deletions. We introduce …
dynamic setting, when the input is subject to point insertions and deletions. We introduce …
Efficient Centroid-Linkage Clustering
We give an efficient algorithm for Centroid-Linkage Hierarchical Agglomerative Clustering
(HAC), which computes a $ c $-approximate clustering in roughly $ n^{1+ O (1/c^ 2)} $ time …
(HAC), which computes a $ c $-approximate clustering in roughly $ n^{1+ O (1/c^ 2)} $ time …