Gemma 2: Improving open language models at a practical size
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-
of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new …
of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new …
Language model behavior: A comprehensive survey
Transformer language models have received widespread public attention, yet their
generated text is often surprising even to NLP researchers. In this survey, we discuss over …
generated text is often surprising even to NLP researchers. In this survey, we discuss over …
Palm 2 technical report
We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and
reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is …
reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is …
The llama 3 herd of models
Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …
presents a new set of foundation models, called Llama 3. It is a herd of language models …
Pythia: A suite for analyzing large language models across training and scaling
How do large language models (LLMs) develop and evolve over the course of training?
How do these patterns change as models scale? To answer these questions, we introduce …
How do these patterns change as models scale? To answer these questions, we introduce …
Gemma: Open models based on gemini research and technology
This work introduces Gemma, a family of lightweight, state-of-the art open models built from
the research and technology used to create Gemini models. Gemma models demonstrate …
the research and technology used to create Gemini models. Gemma models demonstrate …
Extracting training data from diffusion models
Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted
significant attention due to their ability to generate high-quality synthetic images. In this work …
significant attention due to their ability to generate high-quality synthetic images. In this work …
Large language models struggle to learn long-tail knowledge
The Internet contains a wealth of knowledge—from the birthdays of historical figures to
tutorials on how to code—all of which may be learned by language models. However, while …
tutorials on how to code—all of which may be learned by language models. However, while …
Emergent and predictable memorization in large language models
Memorization, or the tendency of large language models (LLMs) to output entire sequences
from their training data verbatim, is a key concern for deploying language models. In …
from their training data verbatim, is a key concern for deploying language models. In …
Propile: Probing privacy leakage in large language models
The rapid advancement and widespread use of large language models (LLMs) have raised
significant concerns regarding the potential leakage of personally identifiable information …
significant concerns regarding the potential leakage of personally identifiable information …