Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
On the implicit bias in deep-learning algorithms
G Vardi - Communications of the ACM, 2023 - dl.acm.org
On the Implicit Bias in Deep-Learning Algorithms Page 1 DEEP LEARNING HAS been highly
successful in recent years and has led to dramatic improvements in multiple domains …
successful in recent years and has led to dramatic improvements in multiple domains …
Nonconvex optimization meets low-rank matrix factorization: An overview
Substantial progress has been made recently on develo** provably accurate and efficient
algorithms for low-rank matrix factorization via nonconvex optimization. While conventional …
algorithms for low-rank matrix factorization via nonconvex optimization. While conventional …
Trained transformers learn linear models in-context
Attention-based neural networks such as transformers have demonstrated a remarkable
ability to exhibit in-context learning (ICL): Given a short prompt sequence of tokens from an …
ability to exhibit in-context learning (ICL): Given a short prompt sequence of tokens from an …
[PDF][PDF] Lora: Low-rank adaptation of large language models.
The dominant paradigm of natural language processing consists of large-scale pre-training
on general domain data and adaptation to particular tasks or domains. As we pre-train larger …
on general domain data and adaptation to particular tasks or domains. As we pre-train larger …
Neural networks can learn representations with gradient descent
Significant theoretical work has established that in specific regimes, neural networks trained
by gradient descent behave like kernel methods. However, in practice, it is known that …
by gradient descent behave like kernel methods. However, in practice, it is known that …
Deep learning: a statistical viewpoint
The remarkable practical success of deep learning has revealed some major surprises from
a theoretical perspective. In particular, simple gradient methods easily find near-optimal …
a theoretical perspective. In particular, simple gradient methods easily find near-optimal …
Towards understanding ensemble, knowledge distillation and self-distillation in deep learning
We formally study how ensemble of deep learning models can improve test accuracy, and
how the superior performance of ensemble can be distilled into a single model using …
how the superior performance of ensemble can be distilled into a single model using …
Understanding gradient descent on the edge of stability in deep learning
Deep learning experiments by\citet {cohen2021gradient} using deterministic Gradient
Descent (GD) revealed an Edge of Stability (EoS) phase when learning rate (LR) and …
Descent (GD) revealed an Edge of Stability (EoS) phase when learning rate (LR) and …
Robust training under label noise by over-parameterization
Recently, over-parameterized deep networks, with increasingly more network parameters
than training samples, have dominated the performances of modern machine learning …
than training samples, have dominated the performances of modern machine learning …
Vision transformers provably learn spatial structure
Abstract Vision Transformers (ViTs) have recently achieved comparable or superior
performance to Convolutional neural networks (CNNs) in computer vision. This empirical …
performance to Convolutional neural networks (CNNs) in computer vision. This empirical …