- Academic Search

A Albalak, Y Elazar, SM **e, S Longpre… - arxiv preprint arxiv …, 2024 - arxiv.org

A major factor in the recent success of large language models is the use of enormous and
ever-growing text datasets for unsupervised pre-training. However, naively training a model …

Uložit Citovat Počet citací tohoto článku: 86 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] zhjwpku.com

[PDF][PDF] A survey of large language models

WX Zhao, K Zhou, J Li, T Tang… - arxiv preprint arxiv …, 2023 - paper-notes.zhjwpku.com

Ever since the Turing Test was proposed in the 1950s, humans have explored the mastering
of language intelligence by machine. Language is essentially a complex, intricate system of …

Uložit Citovat Počet citací tohoto článku: 3763 Související články Všechny verze (počet: 6) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Less: Selecting influential data for targeted instruction tuning

M **a, S Malladi, S Gururangan, S Arora… - arxiv preprint arxiv …, 2024 - arxiv.org

Instruction tuning has unlocked powerful capabilities in large language models (LLMs),
effectively using combined datasets to develop generalpurpose chatbots. However, real …

Uložit Citovat Počet citací tohoto článku: 155 Související články Všechny verze (počet: 7) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arxiv preprint arxiv …, 2024 - arxiv.org

This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

Uložit Citovat Počet citací tohoto článku: 135 Související články Všechny verze (počet: 7) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

The quantization model of neural scaling

E Michaud, Z Liu, U Girit… - Advances in Neural …, 2023 - proceedings.neurips.cc

Abstract We propose the Quantization Model of neural scaling laws, explaining both the
observed power law dropoff of loss with model and data size, and also the sudden …

Uložit Citovat Počet citací tohoto článku: 73 Související články Všechny verze (počet: 6) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Not all tokens are what you need for pretraining

Z Lin, Z Gou, Y Gong, X Liu, R Xu… - Advances in …, 2025 - proceedings.neurips.cc

Previous language model pre-training methods have uniformly applied a next-token
prediction loss to all training tokens. Challenging this norm, we posit that''Not all tokens in a …

Uložit Citovat Počet citací tohoto článku: 3 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Rho-1: Not all tokens are what you need

Z Lin, Z Gou, Y Gong, X Liu, Y Shen, R Xu, C Lin… - arxiv preprint arxiv …, 2024 - arxiv.org

Previous language model pre-training methods have uniformly applied a next-token
prediction loss to all training tokens. Challenging this norm, we posit that" 9l training". Our …

Uložit Citovat Počet citací tohoto článku: 49 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Datacomp-lm: In search of the next generation of training sets for language models

J Li, A Fang, G Smyrnis, M Ivgi, M Jordan… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset
experiments with the goal of improving language models. As part of DCLM, we provide a …

Uložit Citovat Počet citací tohoto článku: 52 Související články Všechny verze (počet: 5) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A tale of tails: Model collapse as a change of scaling laws

E Dohmatob, Y Feng, P Yang, F Charton… - arxiv preprint arxiv …, 2024 - arxiv.org

As AI model size grows, neural scaling laws have become a crucial tool to predict the
improvements of large models when increasing capacity and the size of original (human or …

Uložit Citovat Počet citací tohoto článku: 48 Související články Všechny verze (počet: 8) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dsdm: Model-aware dataset selection with datamodels

L Engstrom, A Feldmann, A Madry - arxiv preprint arxiv:2401.12926, 2024 - arxiv.org

When selecting data for training large-scale models, standard practice is to filter for
examples that match human notions of data quality. Such filtering yields qualitatively clean …

Uložit Citovat Počet citací tohoto článku: 40 Související články Všechny verze (počet: 6) Zobrazit jako HTML

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

Skill-it! a data-driven skills framework for understanding and training language models

A survey on data selection for language models

[PDF][PDF] A survey of large language models

Less: Selecting influential data for targeted instruction tuning

Foundational challenges in assuring alignment and safety of large language models

The quantization model of neural scaling

Not all tokens are what you need for pretraining

Rho-1: Not all tokens are what you need

Datacomp-lm: In search of the next generation of training sets for language models

A tale of tails: Model collapse as a change of scaling laws

Dsdm: Model-aware dataset selection with datamodels