Differentially private hierarchical clustering with provable approximation guarantees

J Imola, A Epasto, M Mahdian… - International …, 2023 - proceedings.mlr.press
Hierarchical Clustering is a popular unsupervised machine learning method with decades of
history and numerous applications. We initiate the study of differentially-private …

Improving dual-encoder training through dynamic indexes for negative mining

N Monath, M Zaheer, K Allen… - … Conference on Artificial …, 2023 - proceedings.mlr.press
Dual encoder models are ubiquitous in modern classification and retrieval. Crucial for
training such dual encoders is an accurate estimation of gradients from the partition function …

Terahac: Hierarchical agglomerative clustering of trillion-edge graphs

L Dhulipala, J Łącki, J Lee, V Mirrokni - … of the ACM on Management of …, 2023 - dl.acm.org
We introduce TeraHAC, a (1+ ε)-approximate hierarchical agglomerative clustering (HAC)
algorithm which scales to trillion-edge graphs. Our algorithm is based on a new approach to …

Discovering personalized characteristic communities in attributed graphs

Y Niu, Y Li, P Karras, Y Wang… - 2024 IEEE 40th …, 2024 - ieeexplore.ieee.org
What is the widest community in which a person exercises a strong impact? Although
extensive attention has been devoted to searching communities containing given …

It's Hard to HAC with Average Linkage!

MH Bateni, L Dhulipala, KN Gowda… - arxiv preprint arxiv …, 2024 - arxiv.org
Average linkage Hierarchical Agglomerative Clustering (HAC) is an extensively studied and
applied method for hierarchical clustering. Recent applications to massive datasets have …

Sub-quadratic (1+\eps)-approximate Euclidean Spanners, with Applications

A Andoni, H Zhang - arxiv preprint arxiv:2310.05315, 2023 - arxiv.org
We study graph spanners for point-set in the high-dimensional Euclidean space. On the one
hand, we prove that spanners with stretch<\sqrt {2} and subquadratic size are not possible …

Sub-quadratic (1+ ϵ)-approximate Euclidean Spanners, with Applications

A Andoni, H Zhang - 2023 IEEE 64th Annual Symposium on …, 2023 - ieeexplore.ieee.org
We study graph spanners for point-set in the high-dimensional Euclidean space. On the one
hand, we prove that spanners with stretch \lt2 and subquadratic size are not possible, even if …

The ParClusterers Benchmark Suite (PCBS): A Fine-Grained Analysis of Scalable Graph Clustering

S Yu, J Shi, J Meindl, D Eisenstat, X Ju… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce the ParClusterers Benchmark Suite (PCBS)--a collection of highly scalable
parallel graph clustering algorithms and benchmarking tools that streamline comparing …

DynHAC: Fully Dynamic Approximate Hierarchical Agglomerative Clustering

S Yu, L Dhulipala, J Łącki, N Parotsidis - arxiv preprint arxiv:2501.07745, 2025 - arxiv.org
We consider the problem of maintaining a hierarchical agglomerative clustering (HAC) in the
dynamic setting, when the input is subject to point insertions and deletions. We introduce …

Efficient Centroid-Linkage Clustering

MH Bateni, L Dhulipala, W Fletcher, KN Gowda… - arxiv preprint arxiv …, 2024 - arxiv.org
We give an efficient algorithm for Centroid-Linkage Hierarchical Agglomerative Clustering
(HAC), which computes a $ c $-approximate clustering in roughly $ n^{1+ O (1/c^ 2)} $ time …