Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Understanding and minimising outlier features in neural network training
Outlier Features (OFs) are neurons whose activation magnitudes significantly exceed the
average over a neural network's (NN) width. They are well known to emerge during standard …
average over a neural network's (NN) width. They are well known to emerge during standard …
Cautious optimizers: Improving training with one line of code
AdamW has been the default optimizer for transformer pretraining. For many years, our
community searches for faster and more stable optimizers with only constraint positive …
community searches for faster and more stable optimizers with only constraint positive …
2 OLMo 2 Furious
T OLMo, P Walsh, L Soldaini, D Groeneveld… - arxiv preprint arxiv …, 2024 - arxiv.org
We present OLMo 2, the next generation of our fully open language models. OLMo 2
includes dense autoregressive models with improved architecture and training recipe …
includes dense autoregressive models with improved architecture and training recipe …
Grams: Gradient descent with adaptive momentum scaling
Y Cao, X Li, Z Song - arxiv preprint arxiv:2412.17107, 2024 - arxiv.org
We introduce\textbf {Gr} adient Descent with\textbf {A} daptive\textbf {M} omentum\textbf {S}
caling (\textbf {Grams}), a novel optimization algorithm that decouples the direction and …
caling (\textbf {Grams}), a novel optimization algorithm that decouples the direction and …
Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective
Multi-task learning through composite loss functions is fundamental to modern deep
learning, yet optimizing competing objectives remains challenging. We present new …
learning, yet optimizing competing objectives remains challenging. We present new …
Avoiding spurious sharpness minimization broadens applicability of SAM
Curvature regularization techniques like Sharpness Aware Minimization (SAM) have shown
great promise in improving generalization on vision tasks. However, we find that SAM …
great promise in improving generalization on vision tasks. However, we find that SAM …
Moonshine: Speech Recognition for Live Transcription and Voice Commands
N Jeffries, E King, M Kudlur, G Nicholson… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper introduces Moonshine, a family of speech recognition models optimized for live
transcription and voice command processing. Moonshine is based on an encoder-decoder …
transcription and voice command processing. Moonshine is based on an encoder-decoder …
Physics of Skill Learning
We aim to understand physics of skill learning, ie, how skills are learned in neural networks
during training. We start by observing the Domino effect, ie, skills are learned sequentially …
during training. We start by observing the Domino effect, ie, skills are learned sequentially …
On The Concurrence of Layer-wise Preconditioning Methods and Provable Feature Learning
Layer-wise preconditioning methods are a family of memory-efficient optimization algorithms
that introduce preconditioners per axis of each layer's weight tensors. These methods have …
that introduce preconditioners per axis of each layer's weight tensors. These methods have …
Improving Adaptive Moment Optimization via Preconditioner Diagonalization
Modern adaptive optimization methods, such as Adam and its variants, have emerged as the
most widely used tools in deep learning over recent years. These algorithms offer automatic …
most widely used tools in deep learning over recent years. These algorithms offer automatic …