Clustering data streams

S Guha, N Mishra - Data stream management: processing high-speed data …, 2016 - Springer
Clustering is a useful and ubiquitous tool in data analysis. Broadly speaking, clustering is
the problem of grou** a data set into several groups such that, under some definition of …

A unified framework for approximating and clustering data

D Feldman, M Langberg - Proceedings of the forty-third annual ACM …, 2011 - dl.acm.org
Given a set F of n positive functions over a ground set X, we consider the problem of
computing x* that minimizes the expression∑ f∈ Ff (x), over x∈ X. A typical application is …

Yinyang k-means: A drop-in replacement of the classic k-means with consistent speedup

Y Ding, Y Zhao, X Shen, M Musuvathi… - … on machine learning, 2015 - proceedings.mlr.press
This paper presents Yinyang K-means, a new algorithm for K-means clustering. By
clustering the centers in the initial stage, and leveraging efficiently maintained lower and …

Improved coresets and sublinear algorithms for power means in euclidean spaces

V Cohen-Addad, D Saulpic… - Advances in Neural …, 2021 - proceedings.neurips.cc
In this paper, we consider the problem of finding high dimensional power means: given a set
$ A $ of $ n $ points in $\R^ d $, find the point $ m $ that minimizes the sum of Euclidean …

Training gaussian mixture models at scale via coresets

M Lucic, M Faulkner, A Krause, D Feldman - Journal of Machine Learning …, 2018 - jmlr.org
How can we train a statistical mixture model on a massive data set? In this work we show
how to construct coresets for mixtures of Gaussians. A coreset is a weighted subset of the …

[BOOK][B] Robust cluster analysis and variable selection

G Ritter - 2014 - books.google.com
Clustering remains a vibrant area of research in statistics. Although there are many books on
this topic, there are relatively few that are well founded in the theoretical aspects. In Robust …

Scalable training of mixture models via coresets

D Feldman, M Faulkner… - Advances in neural …, 2011 - proceedings.neurips.cc
How can we train a statistical mixture model on a massive data set? In this paper, we show
how to construct coresets for mixtures of Gaussians and natural generalizations. A coreset is …

Approximating the minimum vertex cover in sublinear time and a connection to distributed algorithms

M Parnas, D Ron - Theoretical Computer Science, 2007 - Elsevier
For a given graph G over n vertices, let OPTG denote the size of an optimal solution in G of a
particular minimization problem (eg, the size of a minimum vertex cover). A randomized …

Private coresets

D Feldman, A Fiat, H Kaplan, K Nissim - … of the forty-first annual ACM …, 2009 - dl.acm.org
A coreset of a point set P is a small weighted set of points that captures some geometric
properties of P. Coresets have found use in a vast host of geometric settings. We forge a link …

On approximability of clustering problems without candidate centers

V Cohen-Addad, CS Karthik, E Lee - Proceedings of the 2021 ACM-SIAM …, 2021 - SIAM
The k-means objective is arguably the most widely-used cost function for modeling
clustering tasks in a metric space. In practice and historically, k-means is thought of in a …