- Academic Search

T Zhou, P Niu, L Sun, R ** - Advances in neural …, 2023 - proceedings.neurips.cc

Although we have witnessed great success of pre-trained models in natural language
processing (NLP) and computer vision (CV), limited progress has been made for general …

Lagre Referanse Sitert av 372 Beslektede artikler Alle 8 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

The road less scheduled

A Defazio, X Yang, A Khaled… - Advances in …, 2025 - proceedings.neurips.cc

Existing learning rate schedules that do not require specification of the optimization stop**
step $ T $ are greatly out-performed by learning rate schedules that depend on $ T $. We …

Lagre Referanse Sitert av 30 Beslektede artikler Alle 5 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

A unified theory of decentralized SGD with changing topology and local updates

A Koloskova, N Loizou, S Boreiri… - … on machine learning, 2020 - proceedings.mlr.press

Decentralized stochastic optimization methods have gained a lot of attention recently, mainly
because of their cheap per iteration cost, data locality, and their communication-efficiency. In …

Lagre Referanse Sitert av 572 Beslektede artikler Alle 10 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Sparsified SGD with memory

SU Stich, JB Cordonnier… - Advances in neural …, 2018 - proceedings.neurips.cc

Huge scale machine learning problems are nowadays tackled by distributed optimization
algorithms, ie algorithms that leverage the compute power of many devices for training. The …

Lagre Referanse Sitert av 958 Beslektede artikler Alle 10 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A modern introduction to online learning

F Orabona - arxiv preprint arxiv:1912.13213, 2019 - arxiv.org

In this monograph, I introduce the basic concepts of Online Learning through a modern view
of Online Convex Optimization. Here, online learning refers to the framework of regret …

Lagre Referanse Sitert av 444 Beslektede artikler Alle 3 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Local SGD converges fast and communicates little

SU Stich - arxiv preprint arxiv:1805.09767, 2018 - arxiv.org

Mini-batch stochastic gradient descent (SGD) is state of the art in large scale distributed
training. The scheme can reach a linear speedup with respect to the number of workers, but …

Lagre Referanse Sitert av 1227 Beslektede artikler Alle 8 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Smart “predict, then optimize”

AN Elmachtoub, P Grigas - Management Science, 2022 - pubsonline.informs.org

Many real-world analytics problems involve two significant challenges: prediction and
optimization. Because of the typically complex nature of each challenge, the standard …

Lagre Referanse Sitert av 830 Beslektede artikler Alle 9 versjoner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Don't use large mini-batches, use local SGD

T Lin, SU Stich, KK Patel, M Jaggi - arxiv preprint arxiv:1808.07217, 2018 - arxiv.org

Mini-batch stochastic gradient methods (SGD) are state of the art for distributed training of
deep neural networks. Drastic increases in the mini-batch sizes have lead to key efficiency …

Lagre Referanse Sitert av 506 Beslektede artikler Alle 9 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

A finite time analysis of temporal difference learning with linear function approximation

J Bhandari, D Russo, R Singal - Conference on learning …, 2018 - proceedings.mlr.press

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value
function corresponding to a given policy in a Markov decision process. Although TD is one of …

Lagre Referanse Sitert av 442 Beslektede artikler Alle 11 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

FedSplit: An algorithmic framework for fast federated optimization

R Pathak, MJ Wainwright - Advances in neural information …, 2020 - proceedings.neurips.cc

Motivated by federated learning, we consider the hub-and-spoke model of distributed
optimization in which a central authority coordinates the computation of a solution among …

Lagre Referanse Sitert av 227 Beslektede artikler Alle 6 versjoner HTML-versjon

Opprett varsel

Referanse

Avansert søk

Lagret i Mitt bibliotek

A simpler approach to obtaining an O (1/t) convergence rate for the projected stochastic...

One fits all: Power general time series analysis by pretrained lm

The road less scheduled

A unified theory of decentralized SGD with changing topology and local updates

Sparsified SGD with memory

A modern introduction to online learning

Local SGD converges fast and communicates little

Smart “predict, then optimize”

Don't use large mini-batches, use local SGD

A finite time analysis of temporal difference learning with linear function approximation

FedSplit: An algorithmic framework for fast federated optimization