„Google“ mokslinčius

G Vardi - Communications of the ACM, 2023 - dl.acm.org

On the Implicit Bias in Deep-Learning Algorithms Page 1 DEEP LEARNING HAS been highly
successful in recent years and has led to dramatic improvements in multiple domains …

Išsaugoti Cituoti Cituoja 105 Susiję straipsniai Visos 4 versijos

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Nonconvex optimization meets low-rank matrix factorization: An overview

Y Chi, YM Lu, Y Chen - IEEE Transactions on Signal …, 2019 - ieeexplore.ieee.org

Substantial progress has been made recently on develo** provably accurate and efficient
algorithms for low-rank matrix factorization via nonconvex optimization. While conventional …

Išsaugoti Cituoti Cituoja 528 Susiję straipsniai Visos 14 versijos

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

Trained transformers learn linear models in-context

R Zhang, S Frei, PL Bartlett - Journal of Machine Learning Research, 2024 - jmlr.org

Attention-based neural networks such as transformers have demonstrated a remarkable
ability to exhibit in-context learning (ICL): Given a short prompt sequence of tokens from an …

Išsaugoti Cituoti Cituoja 220 Susiję straipsniai Visos 7 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

[PDF][PDF] Lora: Low-rank adaptation of large language models.

EJ Hu, Y Shen, P Wallis, Z Allen-Zhu, Y Li, S Wang… - ICLR, 2022 - arxiv.org

The dominant paradigm of natural language processing consists of large-scale pre-training
on general domain data and adaptation to particular tasks or domains. As we pre-train larger …

Išsaugoti Cituoti Cituoja 11438 Susiję straipsniai Visos 12 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Neural networks can learn representations with gradient descent

A Damian, J Lee… - Conference on Learning …, 2022 - proceedings.mlr.press

Significant theoretical work has established that in specific regimes, neural networks trained
by gradient descent behave like kernel methods. However, in practice, it is known that …

Išsaugoti Cituoti Cituoja 166 Susiję straipsniai Visos 7 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] cambridge.org

Deep learning: a statistical viewpoint

PL Bartlett, A Montanari, A Rakhlin - Acta numerica, 2021 - cambridge.org

The remarkable practical success of deep learning has revealed some major surprises from
a theoretical perspective. In particular, simple gradient methods easily find near-optimal …

Išsaugoti Cituoti Cituoja 391 Susiję straipsniai Visos 11 versijos

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards understanding ensemble, knowledge distillation and self-distillation in deep learning

Z Allen-Zhu, Y Li - arxiv preprint arxiv:2012.09816, 2020 - arxiv.org

We formally study how ensemble of deep learning models can improve test accuracy, and
how the superior performance of ensemble can be distilled into a single model using …

Išsaugoti Cituoti Cituoja 477 Susiję straipsniai Visos 4 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Understanding gradient descent on the edge of stability in deep learning

S Arora, Z Li, A Panigrahi - International Conference on …, 2022 - proceedings.mlr.press

Deep learning experiments by\citet {cohen2021gradient} using deterministic Gradient
Descent (GD) revealed an Edge of Stability (EoS) phase when learning rate (LR) and …

Išsaugoti Cituoti Cituoja 121 Susiję straipsniai Visos 7 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Robust training under label noise by over-parameterization

S Liu, Z Zhu, Q Qu, C You - International Conference on …, 2022 - proceedings.mlr.press

Recently, over-parameterized deep networks, with increasingly more network parameters
than training samples, have dominated the performances of modern machine learning …

Išsaugoti Cituoti Cituoja 138 Susiję straipsniai Visos 7 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Vision transformers provably learn spatial structure

S Jelassi, M Sander, Y Li - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Abstract Vision Transformers (ViTs) have recently achieved comparable or superior
performance to Convolutional neural networks (CNNs) in computer vision. This empirical …

Išsaugoti Cituoti Cituoja 102 Susiję straipsniai Visos 6 versijos HTML kopija

Kurti įspėjimą

Cituoti

Išplėstinė paieška

Išsaugota skiltyje „Mano biblioteka“

Algorithmic regularization in over-parameterized matrix sensing and neural networks with...

On the implicit bias in deep-learning algorithms

Nonconvex optimization meets low-rank matrix factorization: An overview

Trained transformers learn linear models in-context

[PDF][PDF] Lora: Low-rank adaptation of large language models.

Neural networks can learn representations with gradient descent

Deep learning: a statistical viewpoint

Towards understanding ensemble, knowledge distillation and self-distillation in deep learning

Understanding gradient descent on the edge of stability in deep learning

Robust training under label noise by over-parameterization

Vision transformers provably learn spatial structure