Deja vu: Contextual sparsity for efficient llms at inference time

Z Liu, J Wang, T Dao, T Zhou, B Yuan… - International …, 2023 - proceedings.mlr.press
Large language models (LLMs) with hundreds of billions of parameters have sparked a new
wave of exciting AI applications. However, they are computationally expensive at inference …

Sketching as a tool for numerical linear algebra

DP Woodruff - … and Trends® in Theoretical Computer Science, 2014 - nowpublishers.com
This survey highlights the recent advances in algorithms for numerical linear algebra that
have come from the technique of linear sketching, whereby given a matrix, one first …

Minimum cost flows, MDPs, and ℓ1-regression in nearly linear time for dense instances

J Van Den Brand, YT Lee, YP Liu, T Saranurak… - Proceedings of the 53rd …, 2021 - dl.acm.org
In this paper we provide new randomized algorithms with improved runtimes for solving
linear programs with two-sided constraints. In the special case of the minimum cost flow …

Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression

X Meng, MW Mahoney - Proceedings of the forty-fifth annual ACM …, 2013 - dl.acm.org
Low-distortion embeddings are critical building blocks for develo** random sampling and
random projection algorithms for common linear algebra problems. We show that, given a …

Low rank approximation with entrywise l1-norm error

Z Song, DP Woodruff, P Zhong - Proceedings of the 49th Annual ACM …, 2017 - dl.acm.org
We study the ℓ1-low rank approximation problem, where for a given nxd matrix A and
approximation factor α≤ 1, the goal is to output a rank-k matrix  for which‖ A-Â‖ 1≤ α …

On coresets for logistic regression

A Munteanu, C Schwiegelshohn… - Advances in …, 2018 - proceedings.neurips.cc
Coresets are one of the central methods to facilitate the analysis of large data. We continue
a recent line of research applying the theory of coresets to logistic regression. First, we show …

Compressing neural networks: Towards determining the optimal layer-wise decomposition

L Liebenwein, A Maalouf… - Advances in Neural …, 2021 - proceedings.neurips.cc
We present a novel global compression framework for deep neural networks that
automatically analyzes each layer to identify the optimal per-layer compression ratio, while …

New subset selection algorithms for low rank approximation: Offline and online

DP Woodruff, T Yasuda - Proceedings of the 55th Annual ACM …, 2023 - dl.acm.org
Subset selection for the rank k approximation of an n× d matrix A offers improvements in the
interpretability of matrices, as well as a variety of computational savings. This problem is well …

Lp Row Sampling by Lewis Weights

MB Cohen, R Peng - Proceedings of the forty-seventh annual ACM …, 2015 - dl.acm.org
We give a simple algorithm to efficiently sample the rows of a matrix while preserving the p-
norms of its product with vectors. Given an n* d matrix A, we find with high probability and in …

Coresets-methods and history: A theoreticians design pattern for approximation and streaming algorithms

A Munteanu, C Schwiegelshohn - KI-Künstliche Intelligenz, 2018 - Springer
We present a technical survey on the state of the art approaches in data reduction and the
coreset framework. These include geometric decompositions, gradient methods, random …