On-device language models: A comprehensive review

J Xu, Z Li, W Chen, Q Wang, X Gao, Q Cai… - ar** capable language models
(LM). Despite this, pretraining data design is critically under-documented and often guided …

A survey of multimodal large language model from a data-centric perspective

T Bai, H Liang, B Wan, Y Xu, X Li, S Li, L Yang… - arxiv preprint arxiv …, 2024 - arxiv.org
Multimodal large language models (MLLMs) enhance the capabilities of standard large
language models by integrating and processing data from multiple modalities, including text …

Language models scale reliably with over-training and on downstream tasks

SY Gadre, G Smyrnis, V Shankar, S Gururangan… - arxiv preprint arxiv …, 2024 - arxiv.org
Scaling laws are useful guides for derisking expensive training runs, as they predict
performance of large models using cheaper, small-scale experiments. However, there …

Openmoe: An early effort on open mixture-of-experts language models

F Xue, Z Zheng, Y Fu, J Ni, Z Zheng, W Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org
To help the open-source community have a better understanding of Mixture-of-Experts
(MoE) based large language models (LLMs), we train and release OpenMoE, a series of …

Blind baselines beat membership inference attacks for foundation models

D Das, J Zhang, F Tramèr - arxiv preprint arxiv:2406.16201, 2024 - arxiv.org
Membership inference (MI) attacks try to determine if a data sample was used to train a
machine learning model. For foundation models trained on unknown Web data, MI attacks …

Generalization vs Memorization: Tracing Language Models' Capabilities Back to Pretraining Data

X Wang, A Antoniades, Y Elazar, A Amayuelas… - arxiv preprint arxiv …, 2024 - arxiv.org
The impressive capabilities of large language models (LLMs) have sparked debate over
whether these models genuinely generalize to unseen tasks or predominantly rely on …

Position: Key claims in llm research have a long tail of footnotes

A Rogers, S Luccioni - Forty-first International Conference on …, 2024 - openreview.net
Much of the recent discourse within the ML community has been centered around Large
Language Models (LLMs), their functionality and potential--yet not only do we not have a …

How to train long-context language models (effectively)

T Gao, A Wettig, H Yen, D Chen - arxiv preprint arxiv:2410.02660, 2024 - arxiv.org
We study continued training and supervised fine-tuning (SFT) of a language model (LM) to
make effective use of long-context information. We first establish a reliable evaluation …