Google Наука

Y Zhang, C Chen, Z Li, T Ding, C Wu… - arxiv preprint arxiv …, 2024 - arxiv.org

We propose Adam-mini, an optimizer that achieves on par or better performance than
AdamW with 50% less memory footprint. Adam-mini reduces memory by cutting down the …

Запазване Позоваване С позовавания в 24 Сродни статии Всички 4 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Soap: Improving and stabilizing shampoo using adam

N Vyas, D Morwani, R Zhao, I Shapira… - arxiv preprint arxiv …, 2024 - arxiv.org

There is growing evidence of the effectiveness of Shampoo, a higher-order preconditioning
method, over Adam in deep learning optimization tasks. However, Shampoo's drawbacks …

Запазване Позоваване С позовавания в 16 Сродни статии Всички 7 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Cautious optimizers: Improving training with one line of code

K Liang, L Chen, B Liu, Q Liu - arxiv preprint arxiv:2411.16085, 2024 - arxiv.org

AdamW has been the default optimizer for transformer pretraining. For many years, our
community searches for faster and more stable optimizers with only constraint positive …

Запазване Позоваване С позовавания в 4 Сродни статии Всички 2 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Rethinking conventional wisdom in machine learning: From generalization to scaling

L **ao - arxiv preprint arxiv:2409.15156, 2024 - arxiv.org

The remarkable success of large language pretraining and the discovery of scaling laws
signify a paradigm shift in machine learning. Notably, the primary objective has evolved from …

Запазване Позоваване С позовавания в 3 Сродни статии Всички 2 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

JaColBERTv2. 5: Optimising Multi-Vector Retrievers to Create State-of-the-Art Japanese Retrievers with Constrained Resources

B Clavié - arxiv preprint arxiv:2407.20750, 2024 - arxiv.org

Neural Information Retrieval has advanced rapidly in high-resource languages, but progress
in lower-resource ones such as Japanese has been hindered by data scarcity, among other …

Запазване Позоваване С позовавания в 4 Сродни статии Всички 2 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

4-bit Shampoo for Memory-Efficient Network Training

S Wang, P Zhou, J Li, H Huang - Advances in Neural …, 2025 - proceedings.neurips.cc

Second-order optimizers, maintaining a matrix termed a preconditioner, are superior to first-
order optimizers in both theory and practice. The states forming the preconditioner and its …

Запазване Позоваване С позовавания в 1 Сродни статии Всички 5 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

How Does Critical Batch Size Scale in Pre-training?

H Zhang, D Morwani, N Vyas, J Wu, D Zou… - arxiv preprint arxiv …, 2024 - arxiv.org

Training large-scale models under given resources requires careful design of parallelism
strategies. In particular, the efficiency notion of critical batch size (CBS), concerning the …

Запазване Позоваване С позовавания в 3 Сродни статии Всички 4 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

An adaptive stochastic gradient method with non-negative gauss-newton stepsizes

A Orvieto, L **ao - arxiv preprint arxiv:2407.04358, 2024 - arxiv.org

We consider the problem of minimizing the average of a large number of smooth but
possibly non-convex functions. In the context of most machine learning applications, each …

Запазване Позоваване С позовавания в 1 Сродни статии Всички 2 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

General framework for online-to-nonconvex conversion: Schedule-free SGD is also effective for nonconvex optimization

K Ahn, G Magakyan, A Cutkosky - arxiv preprint arxiv:2411.07061, 2024 - arxiv.org

This work investigates the effectiveness of schedule-free methods, developed by A. Defazio
et al.(NeurIPS 2024), in nonconvex optimization settings, inspired by their remarkable …

Запазване Позоваване С позовавания в 1 Сродни статии Всички 2 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

AI-driven skin cancer diagnosis: Grad-CAM and expert annotations for enhanced interpretability

I Matas, C Serrano, F Silva, A Serrano… - arxiv preprint arxiv …, 2024 - arxiv.org

An AI tool has been developed to provide interpretable support for the diagnosis of BCC via
teledermatology, thus speeding up referrals and optimizing resource utilization. The …

Запазване Позоваване С позовавания в 2 Сродни статии Всички 2 версии Във вид на HTML

Създаване на сигнал

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

The road less scheduled

Adam-mini: Use fewer learning rates to gain more

Soap: Improving and stabilizing shampoo using adam

Cautious optimizers: Improving training with one line of code

Rethinking conventional wisdom in machine learning: From generalization to scaling

JaColBERTv2. 5: Optimising Multi-Vector Retrievers to Create State-of-the-Art Japanese Retrievers with Constrained Resources

4-bit Shampoo for Memory-Efficient Network Training

How Does Critical Batch Size Scale in Pre-training?

An adaptive stochastic gradient method with non-negative gauss-newton stepsizes

General framework for online-to-nonconvex conversion: Schedule-free SGD is also effective for nonconvex optimization

AI-driven skin cancer diagnosis: Grad-CAM and expert annotations for enhanced interpretability