On-device language models: A comprehensive review
A survey of multimodal large language model from a data-centric perspective
Multimodal large language models (MLLMs) enhance the capabilities of standard large
language models by integrating and processing data from multiple modalities, including text …
language models by integrating and processing data from multiple modalities, including text …
Language models scale reliably with over-training and on downstream tasks
Scaling laws are useful guides for derisking expensive training runs, as they predict
performance of large models using cheaper, small-scale experiments. However, there …
performance of large models using cheaper, small-scale experiments. However, there …
Openmoe: An early effort on open mixture-of-experts language models
To help the open-source community have a better understanding of Mixture-of-Experts
(MoE) based large language models (LLMs), we train and release OpenMoE, a series of …
(MoE) based large language models (LLMs), we train and release OpenMoE, a series of …
Blind baselines beat membership inference attacks for foundation models
Membership inference (MI) attacks try to determine if a data sample was used to train a
machine learning model. For foundation models trained on unknown Web data, MI attacks …
machine learning model. For foundation models trained on unknown Web data, MI attacks …
Generalization vs Memorization: Tracing Language Models' Capabilities Back to Pretraining Data
The impressive capabilities of large language models (LLMs) have sparked debate over
whether these models genuinely generalize to unseen tasks or predominantly rely on …
whether these models genuinely generalize to unseen tasks or predominantly rely on …
Position: Key claims in llm research have a long tail of footnotes
Much of the recent discourse within the ML community has been centered around Large
Language Models (LLMs), their functionality and potential--yet not only do we not have a …
Language Models (LLMs), their functionality and potential--yet not only do we not have a …
How to train long-context language models (effectively)
We study continued training and supervised fine-tuning (SFT) of a language model (LM) to
make effective use of long-context information. We first establish a reliable evaluation …
make effective use of long-context information. We first establish a reliable evaluation …