- Academic Search

A Albalak, Y Elazar, SM **23a/gei**23a.pdf" data-clk="hl=da&sa=T&oi=gga&ct=gga&cd=6&d=11540674001513001827&ei=KSSxZ_SFFoqy6rQPl5KnWA" data-clk-atid="Y7PE-Nu1KKAJ" target="_blank">[PDF] mlr.press

Cramming: Training a Language Model on a single GPU in one day.

J Gei**, T Goldstein - International Conference on …, 2023 - proceedings.mlr.press

Recent trends in language modeling have focused on increasing performance through
scaling, and have resulted in an environment where training language models is out of …

Gem Citer Citeret af 76 Relaterede artikler Alle 7 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

No train no gain: Revisiting efficient training algorithms for transformer-based language models

J Kaddour, O Key, P Nawrot… - Advances in Neural …, 2023 - proceedings.neurips.cc

The computation necessary for training Transformer-based language models has
skyrocketed in recent years. This trend has motivated research on efficient training …

Gem Citer Citeret af 30 Relaterede artikler Alle 5 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Less: Selecting influential data for targeted instruction tuning

M **a, S Malladi, S Gururangan, S Arora… - arxiv preprint arxiv …, 2024 - arxiv.org

Instruction tuning has unlocked powerful capabilities in large language models (LLMs),
effectively using combined datasets to develop generalpurpose chatbots. However, real …

Gem Citer Citeret af 132 Relaterede artikler Alle 4 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

Compute-efficient deep learning: Algorithmic trends and opportunities

BR Bartoldson, B Kailkhura, D Blalock - Journal of Machine Learning …, 2023 - jmlr.org

Although deep learning has made great progress in recent years, the exploding economic
and environmental costs of training neural networks are becoming unsustainable. To …

Gem Citer Citeret af 50 Relaterede artikler Alle 4 versioner Vis som HTML

Opret underretning

Citer

Avanceret søgning

Gemt i Min samling

Prioritized training on points that are learnable, worth learning, and not yet learnt

A survey on data selection for language models

Cramming: Training a Language Model on a single GPU in one day.

No train no gain: Revisiting efficient training algorithms for transformer-based language models

Less: Selecting influential data for targeted instruction tuning

Compute-efficient deep learning: Algorithmic trends and opportunities