Clustering data streams
Clustering is a useful and ubiquitous tool in data analysis. Broadly speaking, clustering is
the problem of grou** a data set into several groups such that, under some definition of …
the problem of grou** a data set into several groups such that, under some definition of …
A unified framework for approximating and clustering data
Given a set F of n positive functions over a ground set X, we consider the problem of
computing x* that minimizes the expression∑ f∈ Ff (x), over x∈ X. A typical application is …
computing x* that minimizes the expression∑ f∈ Ff (x), over x∈ X. A typical application is …
Yinyang k-means: A drop-in replacement of the classic k-means with consistent speedup
This paper presents Yinyang K-means, a new algorithm for K-means clustering. By
clustering the centers in the initial stage, and leveraging efficiently maintained lower and …
clustering the centers in the initial stage, and leveraging efficiently maintained lower and …
Improved coresets and sublinear algorithms for power means in euclidean spaces
V Cohen-Addad, D Saulpic… - Advances in Neural …, 2021 - proceedings.neurips.cc
In this paper, we consider the problem of finding high dimensional power means: given a set
$ A $ of $ n $ points in $\R^ d $, find the point $ m $ that minimizes the sum of Euclidean …
$ A $ of $ n $ points in $\R^ d $, find the point $ m $ that minimizes the sum of Euclidean …
Training gaussian mixture models at scale via coresets
How can we train a statistical mixture model on a massive data set? In this work we show
how to construct coresets for mixtures of Gaussians. A coreset is a weighted subset of the …
how to construct coresets for mixtures of Gaussians. A coreset is a weighted subset of the …
[BOOK][B] Robust cluster analysis and variable selection
G Ritter - 2014 - books.google.com
Clustering remains a vibrant area of research in statistics. Although there are many books on
this topic, there are relatively few that are well founded in the theoretical aspects. In Robust …
this topic, there are relatively few that are well founded in the theoretical aspects. In Robust …
Scalable training of mixture models via coresets
How can we train a statistical mixture model on a massive data set? In this paper, we show
how to construct coresets for mixtures of Gaussians and natural generalizations. A coreset is …
how to construct coresets for mixtures of Gaussians and natural generalizations. A coreset is …
Approximating the minimum vertex cover in sublinear time and a connection to distributed algorithms
M Parnas, D Ron - Theoretical Computer Science, 2007 - Elsevier
For a given graph G over n vertices, let OPTG denote the size of an optimal solution in G of a
particular minimization problem (eg, the size of a minimum vertex cover). A randomized …
particular minimization problem (eg, the size of a minimum vertex cover). A randomized …
Private coresets
A coreset of a point set P is a small weighted set of points that captures some geometric
properties of P. Coresets have found use in a vast host of geometric settings. We forge a link …
properties of P. Coresets have found use in a vast host of geometric settings. We forge a link …
On approximability of clustering problems without candidate centers
The k-means objective is arguably the most widely-used cost function for modeling
clustering tasks in a metric space. In practice and historically, k-means is thought of in a …
clustering tasks in a metric space. In practice and historically, k-means is thought of in a …