Gemma 2: Improving open language models at a practical size

G Team, M Riviere, S Pathak, PG Sessa… - arxiv preprint arxiv …, 2024 - arxiv.org
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-
of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new …

[PDF][PDF] Language model behavior: A comprehensive survey

TA Chang, BK Bergen - Computational Linguistics, 2024 - direct.mit.edu
Transformer language models have received widespread public attention, yet their
generated text is often surprising even to NLP researchers. In this survey, we discuss over …

Palm 2 technical report

R Anil, AM Dai, O Firat, M Johnson, D Lepikhin… - arxiv preprint arxiv …, 2023 - arxiv.org
We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and
reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is …

The llama 3 herd of models

A Dubey, A Jauhri, A Pandey, A Kadian… - arxiv preprint arxiv …, 2024 - arxiv.org
Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …

Pythia: A suite for analyzing large language models across training and scaling

S Biderman, H Schoelkopf… - International …, 2023 - proceedings.mlr.press
How do large language models (LLMs) develop and evolve over the course of training?
How do these patterns change as models scale? To answer these questions, we introduce …

Gemma: Open models based on gemini research and technology

G Team, T Mesnard, C Hardin, R Dadashi… - arxiv preprint arxiv …, 2024 - arxiv.org
This work introduces Gemma, a family of lightweight, state-of-the art open models built from
the research and technology used to create Gemini models. Gemma models demonstrate …

Extracting training data from diffusion models

N Carlini, J Hayes, M Nasr, M Jagielski… - 32nd USENIX Security …, 2023 - usenix.org
Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted
significant attention due to their ability to generate high-quality synthetic images. In this work …

Large language models struggle to learn long-tail knowledge

N Kandpal, H Deng, A Roberts… - International …, 2023 - proceedings.mlr.press
The Internet contains a wealth of knowledge—from the birthdays of historical figures to
tutorials on how to code—all of which may be learned by language models. However, while …

Analyzing leakage of personally identifiable information in language models

N Lukas, A Salem, R Sim, S Tople… - … IEEE Symposium on …, 2023 - ieeexplore.ieee.org
Language Models (LMs) have been shown to leak information about training data through
sentence-level membership inference and reconstruction attacks. Understanding the risk of …

Emergent and predictable memorization in large language models

S Biderman, U Prashanth, L Sutawika… - Advances in …, 2023 - proceedings.neurips.cc
Memorization, or the tendency of large language models (LLMs) to output entire sequences
from their training data verbatim, is a key concern for deploying language models. In …