H2o: Heavy-hitter oracle for efficient generative inference of large language models
Abstract Large Language Models (LLMs), despite their recent impressive accomplishments,
are notably cost-prohibitive to deploy, particularly for applications involving long-content …
are notably cost-prohibitive to deploy, particularly for applications involving long-content …
In-context learning for attention scheme: from single softmax regression to multiple softmax regression via a tensor trick
Large language models (LLMs) have brought significant and transformative changes in
human society. These models have demonstrated remarkable capabilities in natural …
human society. These models have demonstrated remarkable capabilities in natural …
Training multi-layer over-parametrized neural network in subquadratic time
We consider the problem of training a multi-layer over-parametrized neural network to
minimize the empirical risk induced by a loss function. In the typical setting of over …
minimize the empirical risk induced by a loss function. In the typical setting of over …
A tighter complexity analysis of sparsegpt
In this work, we improved the analysis of the running time of SparseGPT [Frantar, Alistarh
ICML 2023] from $ O (d^{3}) $ to $ O (d^{\omega}+ d^{2+ a+ o (1)}+ d^{1+\omega (1, 1, a)-a}) …
ICML 2023] from $ O (d^{3}) $ to $ O (d^{\omega}+ d^{2+ a+ o (1)}+ d^{1+\omega (1, 1, a)-a}) …
Fast quantum algorithm for attention computation
Large language models (LLMs) have demonstrated exceptional performance across a wide
range of tasks. These models, powered by advanced deep learning techniques, have …
range of tasks. These models, powered by advanced deep learning techniques, have …
Solving attention kernel regression problem via pre-conditioner
Attention mechanism is the key to large language models, and attention matrix serves as an
algorithmic and computational bottleneck for such a scheme. In this paper, we define two …
algorithmic and computational bottleneck for such a scheme. In this paper, we define two …
An iterative algorithm for rescaled hyperbolic functions regression
Large language models (LLMs) have numerous real-life applications across various
domains, such as natural language translation, sentiment analysis, language modeling …
domains, such as natural language translation, sentiment analysis, language modeling …
A nearly-linear time algorithm for structured support vector machines
Quadratic programming is a fundamental problem in the field of convex optimization. Many
practical tasks can be formulated as quadratic programming, for example, the support vector …
practical tasks can be formulated as quadratic programming, for example, the support vector …
Convergence of two-layer regression with nonlinear units
Large language models (LLMs), such as ChatGPT and GPT4, have shown outstanding
performance in many human life task. Attention computation plays an important role in …
performance in many human life task. Attention computation plays an important role in …
Convex Minimization with Integer Minima in Time
Given a convex function $ f $ on $\mathbb {R}^ n $ with an integer minimizer, we show how
to find an exact minimizer of $ f $ using $ O (n^ 2\log n) $ calls to a separation oracle and …
to find an exact minimizer of $ f $ using $ O (n^ 2\log n) $ calls to a separation oracle and …