Академия Google

WX Zhao, K Zhou, J Li, T Tang… - arxiv preprint arxiv …, 2023 - paper-notes.zhjwpku.com

Ever since the Turing Test was proposed in the 1950s, humans have explored the mastering
of language intelligence by machine. Language is essentially a complex, intricate system of …

Сохранить Цитировать Цитируется: 3702 Похожие статьи Все версии статьи (6) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Scaling Laws for Data Filtering--Data Curation cannot be Compute Agnostic

S Goyal, P Maini, ZC Lipton… - Proceedings of the …, 2024 - openaccess.thecvf.com

Vision-language models (VLMs) are trained for thousands of GPU hours on carefully
selected subsets of massive web scrapes. For instance the LAION public dataset retained …

Сохранить Цитировать Цитируется: 25 Похожие статьи Все версии статьи (7) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Datacomp-lm: In search of the next generation of training sets for language models

J Li, A Fang, G Smyrnis, M Ivgi, M Jordan… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset
experiments with the goal of improving language models. As part of DCLM, we provide a …

Сохранить Цитировать Цитируется: 47 Похожие статьи Все версии статьи (5) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Scaling synthetic data creation with 1,000,000,000 personas

T Ge, X Chan, X Wang, D Yu, H Mi, D Yu - arxiv preprint arxiv:2406.20094, 2024 - arxiv.org

We propose a novel persona-driven data synthesis methodology that leverages various
perspectives within a large language model (LLM) to create diverse synthetic data. To fully …

Сохранить Цитировать Цитируется: 46 Похожие статьи Все версии статьи (4) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Cinepile: A long video question answering dataset and benchmark

R Rawal, K Saifullah, M Farré, R Basri… - arxiv preprint arxiv …, 2024 - arxiv.org

Current datasets for long-form video understanding often fall short of providing genuine long-
form comprehension challenges, as many tasks derived from these datasets can be …

Сохранить Цитировать Цитируется: 26 Похожие статьи Все версии статьи (5) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Zamba: A compact 7b ssm hybrid model

P Glorioso, Q Anthony, Y Tokpanov… - arxiv preprint arxiv …, 2024 - arxiv.org

In this technical report, we present Zamba, a novel 7B SSM-transformer hybrid model which
achieves competitive performance against leading open-weight models at a comparable …

Сохранить Цитировать Цитируется: 28 Похожие статьи Все версии статьи (2) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Reverse training to nurse the reversal curse

O Golovneva, Z Allen-Zhu, J Weston… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models (LLMs) have a surprising failure: when trained on" A has a feature
B", they do not generalize to" B is a feature of A", which is termed the Reversal Curse. Even …

Сохранить Цитировать Цитируется: 19 Похожие статьи Все версии статьи (3) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Instruction pre-training: Language models are supervised multitask learners

D Cheng, Y Gu, S Huang, J Bi, M Huang… - arxiv preprint arxiv …, 2024 - arxiv.org

Unsupervised multitask pre-training has been the critical method behind the recent success
of language models (LMs). However, supervised multitask learning still holds significant …

Сохранить Цитировать Цитируется: 16 Похожие статьи Все версии статьи (5) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on data synthesis and augmentation for large language models

K Wang, J Zhu, M Ren, Z Liu, S Li, Z Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

The success of Large Language Models (LLMs) is inherently linked to the availability of vast,
diverse, and high-quality data for training and evaluation. However, the growth rate of high …

Сохранить Цитировать Цитируется: 4 Похожие статьи Все версии статьи (2) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Mates: Model-aware data selection for efficient pretraining with data influence models

Z Yu, S Das, C **ong - Advances in Neural Information …, 2025 - proceedings.neurips.cc

Pretraining data selection has the potential to improve language model pretraining efficiency
by utilizing higher-quality data from massive web data corpora. Current data selection …

Сохранить Цитировать Цитируется: 3 Похожие статьи Все версии статьи (6) В виде HTML

Создать оповещение

Цитировать

Расширенный поиск

Сохранено в вашей библиотеке

Rephrasing the web: A recipe for compute and data-efficient language modeling

[PDF][PDF] A survey of large language models

Scaling Laws for Data Filtering--Data Curation cannot be Compute Agnostic

Datacomp-lm: In search of the next generation of training sets for language models

Scaling synthetic data creation with 1,000,000,000 personas

Cinepile: A long video question answering dataset and benchmark

Zamba: A compact 7b ssm hybrid model

Reverse training to nurse the reversal curse

Instruction pre-training: Language models are supervised multitask learners

A survey on data synthesis and augmentation for large language models

Mates: Model-aware data selection for efficient pretraining with data influence models