- Academic Search

Y Zhang, C Chen, T Ding, Z Li… - Advances in Neural …, 2025 - proceedings.neurips.cc

SGD performs worse than Adam by a significant margin on Transformers, but the reason
remains unclear. In this work, we provide an explanation through the lens of Hessian:(i) …

Spara Citera Citerat av 29 Relaterade artiklar Alla 5 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Acceleration by stepsize hedging: Multi-step descent and the silver stepsize schedule

J Altschuler, P Parrilo - Journal of the ACM, 2023 - dl.acm.org

Can we accelerate the convergence of gradient descent without changing the algorithm—
just by judiciously choosing stepsizes? Surprisingly, we show that the answer is yes. Our …

Spara Citera Citerat av 34 Relaterade artiklar Alla 4 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

Compute-efficient deep learning: Algorithmic trends and opportunities

BR Bartoldson, B Kailkhura, D Blalock - Journal of Machine Learning …, 2023 - jmlr.org

Although deep learning has made great progress in recent years, the exploding economic
and environmental costs of training neural networks are becoming unsustainable. To …

Spara Citera Citerat av 50 Relaterade artiklar Alla 4 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Provably faster gradient descent via long steps

B Grimmer - SIAM Journal on Optimization, 2024 - SIAM

This work establishes new convergence guarantees for gradient descent in smooth convex
optimization via a computer-assisted analysis technique. Our theory allows nonconstant …

Spara Citera Citerat av 38 Relaterade artiklar Alla 5 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Branch-and-bound performance estimation programming: A unified methodology for constructing optimal optimization methods

S Das Gupta, BPG Van Parys, EK Ryu - Mathematical Programming, 2024 - Springer

We present the Branch-and-Bound Performance Estimation Programming (BnB-PEP), a
unified methodology for constructing optimal first-order methods for convex and nonconvex …

Spara Citera Citerat av 53 Relaterade artiklar Alla 10 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

FedP3: Federated personalized and privacy-friendly network pruning under model heterogeneity

K Yi, N Gazagnadou, P Richtárik, L Lyu - arxiv preprint arxiv:2404.09816, 2024 - arxiv.org

The interest in federated learning has surged in recent research due to its unique ability to
train a global model using privacy-secured information held locally on each client. This …

Spara Citera Citerat av 12 Relaterade artiklar Alla 7 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

On fundamental proof structures in first-order optimization

B Goujaud, A Dieuleveut… - 2023 62nd IEEE …, 2023 - ieeexplore.ieee.org

First-order optimization methods have attracted a lot of attention due to their practical
success in many applications, including in machine learning. Obtaining convergence …

Spara Citera Citerat av 9 Relaterade artiklar Alla 12 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards a better theoretical understanding of independent subnetwork training

E Shulgin, P Richtárik - arxiv preprint arxiv:2306.16484, 2023 - arxiv.org

Modern advancements in large-scale machine learning would be impossible without the
paradigm of data-parallel distributed computing. Since distributed computing with large …

Spara Citera Citerat av 8 Relaterade artiklar Alla 14 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Variable step sizes for iterative Jacobian-based inverse kinematics of robotic manipulators

J Colan, A Davila, Y Hasegawa - IEEE Access, 2024 - ieeexplore.ieee.org

This study evaluates the impact of step size selection on Jacobian-based inverse kinematics
(IK) for robotic manipulators. Although traditional constant step size approaches offer …

Spara Citera Citerat av 3 Relaterade artiklar Alla 2 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Block acceleration without momentum: On optimal stepsizes of block gradient descent for least-squares

L Peng, W Yin - arxiv preprint arxiv:2405.16020, 2024 - arxiv.org

Block coordinate descent is a powerful algorithmic template suitable for big data
optimization. This template admits a lot of variants including block gradient descent (BGD) …

Spara Citera Citerat av 2 Relaterade artiklar Alla 6 versionerna Se som HTML-version

Skapa alarm

Citera

Avancerad sökning

Har sparats i Mitt bibliotek

Super-acceleration with cyclical step-sizes

Why transformers need adam: A hessian perspective

Acceleration by stepsize hedging: Multi-step descent and the silver stepsize schedule

Compute-efficient deep learning: Algorithmic trends and opportunities

Provably faster gradient descent via long steps

Branch-and-bound performance estimation programming: A unified methodology for constructing optimal optimization methods

FedP3: Federated personalized and privacy-friendly network pruning under model heterogeneity

On fundamental proof structures in first-order optimization

Towards a better theoretical understanding of independent subnetwork training

Variable step sizes for iterative Jacobian-based inverse kinematics of robotic manipulators

Block acceleration without momentum: On optimal stepsizes of block gradient descent for least-squares