الباحث العلمي من Google

D Ataee Tarzanagh, Y Li, X Zhang… - Advances in Neural …, 2023‏ - proceedings.neurips.cc‏

Attention mechanism is a central component of the transformer architecture which led to the
phenomenal success of large language models. However, the theoretical principles …‏

حفظ اقتباس تم اقتباسها في عدد: 50 مقالات ذات صلة الإصدارات الـ 6كلها إصدار HTML‏

[Free GPT-4]

[PDF] arxiv.org

Transformers as support vector machines‏

DA Tarzanagh, Y Li, C Thrampoulidis… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Since its inception in" Attention Is All You Need", transformer architecture has led to
revolutionary advancements in NLP. The attention layer within the transformer admits a …‏

حفظ اقتباس تم اقتباسها في عدد: 80 مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

[Free GPT-4]

[PDF] arxiv.org

A fast optimization view: Reformulating single layer attention in llm based on tensor and svm trick, and solving it in matrix multiplication time‏

Y Gao, Z Song, W Wang, J Yin - arxiv preprint arxiv:2309.07418, 2023‏ - arxiv.org‏

Large language models (LLMs) have played a pivotal role in revolutionizing various facets
of our daily existence. Solving attention regression is a fundamental task in optimizing LLMs …‏

حفظ اقتباس تم اقتباسها في عدد: 39 مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

[Free GPT-4]

[PDF] neurips.cc

Implicit bias of gradient descent on reparametrized models: On equivalence to mirror descent‏

Z Li, T Wang, JD Lee, S Arora - Advances in Neural …, 2022‏ - proceedings.neurips.cc‏

As part of the effort to understand implicit bias of gradient descent in overparametrized
models, several results have shown how the training trajectory on the overparametrized …‏

حفظ اقتباس تم اقتباسها في عدد: 30 مقالات ذات صلة الإصدارات الـ 11كلها إصدار HTML‏

[Free GPT-4]

[PDF] neurips.cc

The power and limitation of pretraining-finetuning for linear regression under covariate shift‏

J Wu, D Zou, V Braverman, Q Gu… - Advances in Neural …, 2022‏ - proceedings.neurips.cc‏

We study linear regression under covariate shift, where the marginal distribution over the
input covariates differs in the source and the target domains, while the conditional …‏

حفظ اقتباس تم اقتباسها في عدد: 19 مقالات ذات صلة الإصدارات الـ 11كلها إصدار HTML‏

[Free GPT-4]

[PDF] arxiv.org

Homogenization of SGD in high-dimensions: Exact dynamics and generalization properties‏

C Paquette, E Paquette, B Adlam… - Mathematical …, 2024‏ - Springer‏

We develop a stochastic differential equation, called homogenized SGD, for analyzing the
dynamics of stochastic gradient descent (SGD) on a high-dimensional random least squares …‏

حفظ اقتباس تم اقتباسها في عدد: 26 مقالات ذات صلة الإصدارات الـ 2كلها

[Free GPT-4]

[PDF] mlr.press

Last iterate risk bounds of sgd with decaying stepsize for overparameterized linear regression‏

J Wu, D Zou, V Braverman, Q Gu… - … on Machine Learning, 2022‏ - proceedings.mlr.press‏

Stochastic gradient descent (SGD) has been shown to generalize well in many deep
learning applications. In practice, one often runs SGD with a geometrically decaying …‏

حفظ اقتباس تم اقتباسها في عدد: 25 مقالات ذات صلة الإصدارات الـ 11كلها إصدار HTML‏

[Free GPT-4]

[PDF] neurips.cc

Implicit regularization or implicit conditioning? exact risk trajectories of sgd in high dimensions‏

C Paquette, E Paquette, B Adlam… - Advances in Neural …, 2022‏ - proceedings.neurips.cc‏

Stochastic gradient descent (SGD) is a pillar of modern machine learning, serving as the go-
to optimization algorithm for a diverse array of problems. While the empirical success of SGD …‏

حفظ اقتباس تم اقتباسها في عدد: 16 مقالات ذات صلة الإصدارات الـ 5كلها إصدار HTML‏

[Free GPT-4]

[PDF] arxiv.org

How transformers utilize multi-head attention in in-context learning? a case study on sparse linear regression‏

X Chen, L Zhao, D Zou - arxiv preprint arxiv:2408.04532, 2024‏ - arxiv.org‏

Despite the remarkable success of transformer-based models in various real-world tasks,
their underlying mechanisms remain poorly understood. Recent studies have suggested that …‏

حفظ اقتباس تم اقتباسها في عدد: 3 مقالات ذات صلة الإصدارات الـ 3كلها إصدار HTML‏

[Free GPT-4]

[PDF] mlr.press

Finite-sample analysis of learning high-dimensional single ReLU neuron‏

J Wu, D Zou, Z Chen, V Braverman… - International …, 2023‏ - proceedings.mlr.press‏

This paper considers the problem of learning single ReLU neuron with squared loss (aka,
ReLU regression) in the overparameterized regime, where the input dimension can exceed …‏

حفظ اقتباس تم اقتباسها في عدد: 5 مقالات ذات صلة الإصدارات الـ 10كلها إصدار HTML‏

إنشاء تنبيه

اقتباس

بحث متقدم

تم حفظ المقالة في مكتبتي.

The benefits of implicit regularization from sgd in least squares problems

Max-margin token selection in attention mechanism‏

Transformers as support vector machines‏

A fast optimization view: Reformulating single layer attention in llm based on tensor and svm trick, and solving it in matrix multiplication time‏

Implicit bias of gradient descent on reparametrized models: On equivalence to mirror descent‏

The power and limitation of pretraining-finetuning for linear regression under covariate shift‏

Homogenization of SGD in high-dimensions: Exact dynamics and generalization properties‏

Last iterate risk bounds of sgd with decaying stepsize for overparameterized linear regression‏

Implicit regularization or implicit conditioning? exact risk trajectories of sgd in high dimensions‏

How transformers utilize multi-head attention in in-context learning? a case study on sparse linear regression‏

Finite-sample analysis of learning high-dimensional single ReLU neuron‏