Gemma 2: Improving open language models at a practical size

G Team, M Riviere, S Pathak, PG Sessa… - arxiv preprint arxiv …, 2024‏ - arxiv.org
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-
of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new …

Language model behavior: A comprehensive survey

TA Chang, BK Bergen - Computational Linguistics, 2024‏ - direct.mit.edu
Transformer language models have received widespread public attention, yet their
generated text is often surprising even to NLP researchers. In this survey, we discuss over …

Palm 2 technical report

R Anil, AM Dai, O Firat, M Johnson, D Lepikhin… - arxiv preprint arxiv …, 2023‏ - arxiv.org
We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and
reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is …

The llama 3 herd of models

A Dubey, A Jauhri, A Pandey, A Kadian… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …

Pythia: A suite for analyzing large language models across training and scaling

S Biderman, H Schoelkopf… - International …, 2023‏ - proceedings.mlr.press
How do large language models (LLMs) develop and evolve over the course of training?
How do these patterns change as models scale? To answer these questions, we introduce …

Gemma: Open models based on gemini research and technology

G Team, T Mesnard, C Hardin, R Dadashi… - arxiv preprint arxiv …, 2024‏ - arxiv.org
This work introduces Gemma, a family of lightweight, state-of-the art open models built from
the research and technology used to create Gemini models. Gemma models demonstrate …

Extracting training data from diffusion models

N Carlini, J Hayes, M Nasr, M Jagielski… - 32nd USENIX Security …, 2023‏ - usenix.org
Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted
significant attention due to their ability to generate high-quality synthetic images. In this work …

Large language models struggle to learn long-tail knowledge

N Kandpal, H Deng, A Roberts… - International …, 2023‏ - proceedings.mlr.press
The Internet contains a wealth of knowledge—from the birthdays of historical figures to
tutorials on how to code—all of which may be learned by language models. However, while …

Emergent and predictable memorization in large language models

S Biderman, U Prashanth, L Sutawika… - Advances in …, 2024‏ - proceedings.neurips.cc
Memorization, or the tendency of large language models (LLMs) to output entire sequences
from their training data verbatim, is a key concern for deploying language models. In …

Propile: Probing privacy leakage in large language models

S Kim, S Yun, H Lee, M Gubri… - Advances in Neural …, 2024‏ - proceedings.neurips.cc
The rapid advancement and widespread use of large language models (LLMs) have raised
significant concerns regarding the potential leakage of personally identifiable information …