- Academic Search

J Xu, Z Li, W Chen, Q Wang, X Gao, Q Cai… - ar** capable language models
(LM). Despite this, pretraining data design is critically under-documented and often guided …

Save Cite Cited by 125 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

A survey of multimodal large language model from a data-centric perspective

T Bai, H Liang, B Wan, Y Xu, X Li, S Li, L Yang… - arxiv preprint arxiv …, 2024 - arxiv.org

Multimodal large language models (MLLMs) enhance the capabilities of standard large
language models by integrating and processing data from multiple modalities, including text …

Save Cite Cited by 32 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Language models scale reliably with over-training and on downstream tasks

SY Gadre, G Smyrnis, V Shankar, S Gururangan… - arxiv preprint arxiv …, 2024 - arxiv.org

Scaling laws are useful guides for derisking expensive training runs, as they predict
performance of large models using cheaper, small-scale experiments. However, there …

Save Cite Cited by 20 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Openmoe: An early effort on open mixture-of-experts language models

F Xue, Z Zheng, Y Fu, J Ni, Z Zheng, W Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org

To help the open-source community have a better understanding of Mixture-of-Experts
(MoE) based large language models (LLMs), we train and release OpenMoE, a series of …

Save Cite Cited by 54 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Blind baselines beat membership inference attacks for foundation models

D Das, J Zhang, F Tramèr - arxiv preprint arxiv:2406.16201, 2024 - arxiv.org

Membership inference (MI) attacks try to determine if a data sample was used to train a
machine learning model. For foundation models trained on unknown Web data, MI attacks …

Save Cite Cited by 15 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Generalization vs Memorization: Tracing Language Models' Capabilities Back to Pretraining Data

X Wang, A Antoniades, Y Elazar, A Amayuelas… - arxiv preprint arxiv …, 2024 - arxiv.org

The impressive capabilities of large language models (LLMs) have sparked debate over
whether these models genuinely generalize to unseen tasks or predominantly rely on …

Save Cite Cited by 14 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] openreview.net

Position: Key claims in llm research have a long tail of footnotes

A Rogers, S Luccioni - Forty-first International Conference on …, 2024 - openreview.net

Much of the recent discourse within the ML community has been centered around Large
Language Models (LLMs), their functionality and potential--yet not only do we not have a …

Save Cite Cited by 8 Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

How to train long-context language models (effectively)

T Gao, A Wettig, H Yen, D Chen - arxiv preprint arxiv:2410.02660, 2024 - arxiv.org

We study continued training and supervised fine-tuning (SFT) of a language model (LM) to
make effective use of long-context information. We first establish a reliable evaluation …

Save Cite Cited by 9 Related articles All 2 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Dolma: An open corpus of three trillion tokens for language model pretraining research

On-device language models: A comprehensive review

A survey of multimodal large language model from a data-centric perspective

Language models scale reliably with over-training and on downstream tasks

Openmoe: An early effort on open mixture-of-experts language models

Blind baselines beat membership inference attacks for foundation models

Generalization vs Memorization: Tracing Language Models' Capabilities Back to Pretraining Data

Position: Key claims in llm research have a long tail of footnotes

How to train long-context language models (effectively)