Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Why transformers need adam: A hessian perspective
Y Zhang, C Chen, T Ding, Z Li… - Advances in Neural …, 2025 - proceedings.neurips.cc
SGD performs worse than Adam by a significant margin on Transformers, but the reason
remains unclear. In this work, we provide an explanation through the lens of Hessian:(i) …
remains unclear. In this work, we provide an explanation through the lens of Hessian:(i) …
Acceleration by stepsize hedging: Multi-step descent and the silver stepsize schedule
Can we accelerate the convergence of gradient descent without changing the algorithm—
just by judiciously choosing stepsizes? Surprisingly, we show that the answer is yes. Our …
just by judiciously choosing stepsizes? Surprisingly, we show that the answer is yes. Our …
Compute-efficient deep learning: Algorithmic trends and opportunities
Although deep learning has made great progress in recent years, the exploding economic
and environmental costs of training neural networks are becoming unsustainable. To …
and environmental costs of training neural networks are becoming unsustainable. To …
Provably faster gradient descent via long steps
B Grimmer - SIAM Journal on Optimization, 2024 - SIAM
This work establishes new convergence guarantees for gradient descent in smooth convex
optimization via a computer-assisted analysis technique. Our theory allows nonconstant …
optimization via a computer-assisted analysis technique. Our theory allows nonconstant …
Branch-and-bound performance estimation programming: A unified methodology for constructing optimal optimization methods
We present the Branch-and-Bound Performance Estimation Programming (BnB-PEP), a
unified methodology for constructing optimal first-order methods for convex and nonconvex …
unified methodology for constructing optimal first-order methods for convex and nonconvex …
FedP3: Federated personalized and privacy-friendly network pruning under model heterogeneity
The interest in federated learning has surged in recent research due to its unique ability to
train a global model using privacy-secured information held locally on each client. This …
train a global model using privacy-secured information held locally on each client. This …
On fundamental proof structures in first-order optimization
First-order optimization methods have attracted a lot of attention due to their practical
success in many applications, including in machine learning. Obtaining convergence …
success in many applications, including in machine learning. Obtaining convergence …
Towards a better theoretical understanding of independent subnetwork training
Modern advancements in large-scale machine learning would be impossible without the
paradigm of data-parallel distributed computing. Since distributed computing with large …
paradigm of data-parallel distributed computing. Since distributed computing with large …
Variable step sizes for iterative Jacobian-based inverse kinematics of robotic manipulators
This study evaluates the impact of step size selection on Jacobian-based inverse kinematics
(IK) for robotic manipulators. Although traditional constant step size approaches offer …
(IK) for robotic manipulators. Although traditional constant step size approaches offer …
Block acceleration without momentum: On optimal stepsizes of block gradient descent for least-squares
Block coordinate descent is a powerful algorithmic template suitable for big data
optimization. This template admits a lot of variants including block gradient descent (BGD) …
optimization. This template admits a lot of variants including block gradient descent (BGD) …