Improved Coresets for Euclidean -Means

V Cohen-Addad, K Green Larsen… - Advances in …, 2022 - proceedings.neurips.cc
Given a set of $ n $ points in $ d $ dimensions, the Euclidean $ k $-means problem (resp.
Euclidean $ k $-median) consists of finding $ k $ centers such that the sum of squared …

Towards optimal lower bounds for k-median and k-means coresets

V Cohen-Addad, KG Larsen, D Saulpic… - Proceedings of the 54th …, 2022 - dl.acm.org
The (k, z)-clustering problem consists of finding a set of k points called centers, such that the
sum of distances raised to the power of z of every data point to its closest center is …

The power of uniform sampling for coresets

V Braverman, V Cohen-Addad… - 2022 IEEE 63rd …, 2022 - ieeexplore.ieee.org
Motivated by practical generalizations of the classic k-median and k-means objectives, such
as clustering with size constraints, fair clustering, and Wasserstein barycenter, we introduce …

Coverage-centric coreset selection for high pruning rates

H Zheng, R Liu, F Lai, A Prakash - arxiv preprint arxiv:2210.15809, 2022 - arxiv.org
One-shot coreset selection aims to select a representative subset of the training data, given
a pruning rate, that can later be used to train future models while retaining high accuracy …

Improved coresets and sublinear algorithms for power means in euclidean spaces

V Cohen-Addad, D Saulpic… - Advances in Neural …, 2021 - proceedings.neurips.cc
In this paper, we consider the problem of finding high dimensional power means: given a set
$ A $ of $ n $ points in $\R^ d $, find the point $ m $ that minimizes the sum of Euclidean …

New subset selection algorithms for low rank approximation: Offline and online

DP Woodruff, T Yasuda - Proceedings of the 55th Annual ACM …, 2023 - dl.acm.org
Subset selection for the rank k approximation of an n× d matrix A offers improvements in the
interpretability of matrices, as well as a variety of computational savings. This problem is well …

Coresets for Vertical Federated Learning: Regularized Linear Regression and -Means Clustering

L Huang, Z Li, J Sun, H Zhao - Advances in Neural …, 2022 - proceedings.neurips.cc
Vertical federated learning (VFL), where data features are stored in multiple parties
distributively, is an important area in machine learning. However, the communication …

Streaming Euclidean k-median and k-means with o (log n) Space

V Cohen-Addad, DP Woodruff… - 2023 IEEE 64th Annual …, 2023 - ieeexplore.ieee.org
We consider the classic Euclidean k-median and k-means objective on data streams, where
the goal is to provide a (1+ε)-approximation to the optimal k-median or k-means solution …

Tight bounds for volumetric spanners and applications

A Bhaskara, S Mahabadi… - Advances in Neural …, 2024 - proceedings.neurips.cc
Given a set of points of interest, a volumetric spanner is a subset of the points using which all
the points can be expressed using" small" coefficients (measured in an appropriate norm) …

Near-Optimal -Clustering in the Sliding Window Model

D Woodruff, P Zhong, S Zhou - Advances in Neural …, 2024 - proceedings.neurips.cc
Clustering is an important technique for identifying structural information in large-scale data
analysis, where the underlying dataset may be too large to store. In many applications …