Improved Coresets for Euclidean -Means
Given a set of $ n $ points in $ d $ dimensions, the Euclidean $ k $-means problem (resp.
Euclidean $ k $-median) consists of finding $ k $ centers such that the sum of squared …
Euclidean $ k $-median) consists of finding $ k $ centers such that the sum of squared …
Towards optimal lower bounds for k-median and k-means coresets
The (k, z)-clustering problem consists of finding a set of k points called centers, such that the
sum of distances raised to the power of z of every data point to its closest center is …
sum of distances raised to the power of z of every data point to its closest center is …
Practical coreset constructions for machine learning
We investigate coresets-succinct, small summaries of large data sets-so that solutions found
on the summary are provably competitive with solution found on the full data set. We provide …
on the summary are provably competitive with solution found on the full data set. We provide …
Coresets and sketches
JM Phillips - Handbook of discrete and computational geometry, 2017 - taylorfrancis.com
Geometric data summarization has become an essential tool in both geometric
approximation algorithms and where geometry intersects with big data problems. In linear or …
approximation algorithms and where geometry intersects with big data problems. In linear or …
Randomized sketches of convex programs with sharp guarantees
Random projection (RP) is a classical technique for reducing storage and computational
costs. We analyze RP-based approximations of convex programs, in which the original …
costs. We analyze RP-based approximations of convex programs, in which the original …
Materialization optimizations for feature selection workloads
There is an arms race in the data management industry to support statistical analytics.
Feature selection, the process of selecting a feature set that will be used to build a statistical …
Feature selection, the process of selecting a feature set that will be used to build a statistical …
Dimmwitted: A study of main-memory statistical analytics
We perform the first study of the tradeoff space of access methods and replication to support
statistical analytics using first-order methods executed in the main memory of a Non-Uniform …
statistical analytics using first-order methods executed in the main memory of a Non-Uniform …
Improved matrix algorithms via the subsampled randomized Hadamard transform
Several recent randomized linear algebra algorithms rely upon fast dimension reduction
methods. A popular choice is the subsampled randomized Hadamard transform (SRHT). In …
methods. A popular choice is the subsampled randomized Hadamard transform (SRHT). In …
Coresets for Vertical Federated Learning: Regularized Linear Regression and -Means Clustering
Vertical federated learning (VFL), where data features are stored in multiple parties
distributively, is an important area in machine learning. However, the communication …
distributively, is an important area in machine learning. However, the communication …
DeepDive: a data management system for automatic knowledge base construction
C Zhang - 2015 - search.proquest.com
Many pressing questions in science are macroscopic: they require scientists to consult
information expressed in a wide range of resources, many of which are not organized in a …
information expressed in a wide range of resources, many of which are not organized in a …