- Academic Search

W Fedus, J Dean, B Zoph - arxiv preprint arxiv:2209.01667, 2022 - arxiv.org

Sparse expert models are a thirty-year old concept re-emerging as a popular architecture in
deep learning. This class of architecture encompasses Mixture-of-Experts, Switch …

Gem Citer Citeret af 155 Relaterede artikler Alle 3 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Branch-train-merge: Embarrassingly parallel training of expert language models

M Li, S Gururangan, T Dettmers, M Lewis… - arxiv preprint arxiv …, 2022 - arxiv.org

We present Branch-Train-Merge (BTM), a communication-efficient algorithm for
embarrassingly parallel training of large language models (LLMs). We show it is possible to …

Gem Citer Citeret af 155 Relaterede artikler Alle 4 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Continual pre-training of large language models: How to (re) warm your model?

K Gupta, B Thérien, A Ibrahim, ML Richter… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models (LLMs) are routinely pre-trained on billions of tokens, only to restart
the process over again once new data becomes available. A much cheaper and more …

Gem Citer Citeret af 70 Relaterede artikler Alle 6 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Unified scaling laws for routed language models

A Clark, D de Las Casas, A Guy… - International …, 2022 - proceedings.mlr.press

The performance of a language model has been shown to be effectively modeled as a
power-law in its parameter count. Here we study the scaling behaviors of Routing Networks …

Gem Citer Citeret af 70 Relaterede artikler Alle 3 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dynamically expandable graph convolution for streaming recommendation

B He, X He, Y Zhang, R Tang, C Ma - … of the ACM Web Conference 2023, 2023 - dl.acm.org

Personalized recommender systems have been widely studied and deployed to reduce
information overload and satisfy users' diverse needs. However, conventional …

Gem Citer Citeret af 37 Relaterede artikler Alle 5 versioner

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Progfed: effective, communication, and computation efficient federated learning by progressive training

HP Wang, S Stich, Y He, M Fritz - … Conference on Machine …, 2022 - proceedings.mlr.press

Federated learning is a powerful distributed learning scheme that allows numerous edge
devices to collaboratively train a model without sharing their data. However, training is …

Gem Citer Citeret af 61 Relaterede artikler Alle 12 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Learning equi-angular representations for online continual learning

M Seo, H Koh, W Jeung, M Lee, S Kim… - Proceedings of the …, 2024 - openaccess.thecvf.com

Online continual learning suffers from an underfitted solution due to insufficient training for
prompt model updates (eg single-epoch training). To address the challenge we propose an …

Gem Citer Citeret af 12 Relaterede artikler Alle 8 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

Nevis' 22: A stream of 100 tasks sampled from 30 years of computer vision research

J Bornschein, A Galashov, R Hemsley… - Journal of Machine …, 2023 - jmlr.org

A shared goal of several machine learning communities like continual learning, meta-
learning and transfer learning, is to design algorithms and models that efficiently and …

Gem Citer Citeret af 20 Relaterede artikler Alle 4 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Just say the name: Online continual learning with category names only via data generation

M Seo, S Cho, M Lee, D Misra, H Choi, SJ Kim… - arxiv preprint arxiv …, 2024 - arxiv.org

Requiring extensive human supervision is often impractical for continual learning due to its
cost, leading to the emergence of'name-only continual learning'that only provides the name …

Gem Citer Citeret af 4 Relaterede artikler Alle 3 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

When does re-initialization work?

S Zaidi, T Berariu, H Kim, J Bornschein… - Proceedings …, 2023 - proceedings.mlr.press

Re-initializing a neural network during training has been observed to improve generalization
in recent works. Yet it is neither widely adopted in deep learning practice nor is it often used …

Gem Citer Citeret af 19 Relaterede artikler Alle 6 versioner Vis som HTML

Opret underretning

Citer

Avanceret søgning

Gemt i Min samling

On anytime learning at macroscale

A review of sparse expert models in deep learning

Branch-train-merge: Embarrassingly parallel training of expert language models

Continual pre-training of large language models: How to (re) warm your model?

Unified scaling laws for routed language models

Dynamically expandable graph convolution for streaming recommendation

Progfed: effective, communication, and computation efficient federated learning by progressive training

Learning equi-angular representations for online continual learning

Nevis' 22: A stream of 100 tasks sampled from 30 years of computer vision research

Just say the name: Online continual learning with category names only via data generation

When does re-initialization work?