Google Académico

Z Zhang, Y Sheng, T Zhou, T Chen… - Advances in …, 2024 - proceedings.neurips.cc

Abstract Large Language Models (LLMs), despite their recent impressive accomplishments,
are notably cost-prohibitive to deploy, particularly for applications involving long-content …

Guardar Citar Citado por 274 Artículos relacionados Las 7 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

In-context learning for attention scheme: from single softmax regression to multiple softmax regression via a tensor trick

Y Gao, Z Song, S **e - arxiv preprint arxiv:2307.02419, 2023 - arxiv.org

Large language models (LLMs) have brought significant and transformative changes in
human society. These models have demonstrated remarkable capabilities in natural …

Guardar Citar Citado por 28 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Training multi-layer over-parametrized neural network in subquadratic time

Z Song, L Zhang, R Zhang - arxiv preprint arxiv:2112.07628, 2021 - arxiv.org

We consider the problem of training a multi-layer over-parametrized neural network to
minimize the empirical risk induced by a loss function. In the typical setting of over …

Guardar Citar Citado por 72 Artículos relacionados Las 6 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

A tighter complexity analysis of sparsegpt

X Li, Y Liang, Z Shi, Z Song - arxiv preprint arxiv:2408.12151, 2024 - arxiv.org

In this work, we improved the analysis of the running time of SparseGPT [Frantar, Alistarh
ICML 2023] from $ O (d^{3}) $ to $ O (d^{\omega}+ d^{2+ a+ o (1)}+ d^{1+\omega (1, 1, a)-a}) …

Guardar Citar Citado por 20 Artículos relacionados Las 5 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Fast quantum algorithm for attention computation

Y Gao, Z Song, X Yang, R Zhang - arxiv preprint arxiv:2307.08045, 2023 - arxiv.org

Large language models (LLMs) have demonstrated exceptional performance across a wide
range of tasks. These models, powered by advanced deep learning techniques, have …

Guardar Citar Citado por 25 Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]

[PDF] mlr.press

Solving attention kernel regression problem via pre-conditioner

Z Song, J Yin, L Zhang - International Conference on …, 2024 - proceedings.mlr.press

Attention mechanism is the key to large language models, and attention matrix serves as an
algorithmic and computational bottleneck for such a scheme. In this paper, we define two …

Guardar Citar Citado por 15 Artículos relacionados Las 4 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

An iterative algorithm for rescaled hyperbolic functions regression

Y Gao, Z Song, J Yin - arxiv preprint arxiv:2305.00660, 2023 - arxiv.org

Large language models (LLMs) have numerous real-life applications across various
domains, such as natural language translation, sentiment analysis, language modeling …

Guardar Citar Citado por 34 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

A nearly-linear time algorithm for structured support vector machines

Y Gu, Z Song, L Zhang - arxiv preprint arxiv:2307.07735, 2023 - arxiv.org

Quadratic programming is a fundamental problem in the field of convex optimization. Many
practical tasks can be formulated as quadratic programming, for example, the support vector …

Guardar Citar Citado por 19 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Convergence of two-layer regression with nonlinear units

Y Deng, Z Song, S **e - arxiv preprint arxiv:2308.08358, 2023 - arxiv.org

Large language models (LLMs), such as ChatGPT and GPT4, have shown outstanding
performance in many human life task. Attention computation plays an important role in …

Guardar Citar Citado por 9 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Convex Minimization with Integer Minima in Time

H Jiang, YT Lee, Z Song, L Zhang - arxiv preprint arxiv:2304.03426, 2023 - arxiv.org

Given a convex function $ f $ on $\mathbb {R}^ n $ with an integer minimizer, we show how
to find an exact minimizer of $ f $ using $ O (n^ 2\log n) $ calls to a separation oracle and …

Guardar Citar Citado por 12 Artículos relacionados Las 2 versiones Versión en HTML

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Space-efficient interior point method, with applications to linear programming and maximum...

H2o: Heavy-hitter oracle for efficient generative inference of large language models

In-context learning for attention scheme: from single softmax regression to multiple softmax regression via a tensor trick

Training multi-layer over-parametrized neural network in subquadratic time

A tighter complexity analysis of sparsegpt

Fast quantum algorithm for attention computation

Solving attention kernel regression problem via pre-conditioner

An iterative algorithm for rescaled hyperbolic functions regression

A nearly-linear time algorithm for structured support vector machines

Convergence of two-layer regression with nonlinear units

Convex Minimization with Integer Minima in Time