Challenges and applications of large language models
Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …
Ai alignment: A comprehensive survey
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …
Pythia: A suite for analyzing large language models across training and scaling
How do large language models (LLMs) develop and evolve over the course of training?
How do these patterns change as models scale? To answer these questions, we introduce …
How do these patterns change as models scale? To answer these questions, we introduce …
Emergent and predictable memorization in large language models
Memorization, or the tendency of large language models (LLMs) to output entire sequences
from their training data verbatim, is a key concern for deploying language models. In …
from their training data verbatim, is a key concern for deploying language models. In …
Llemma: An open language model for mathematics
We present Llemma, a large language model for mathematics. We continue pretraining
Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing …
Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing …
Representation engineering: A top-down approach to ai transparency
In this paper, we identify and characterize the emerging area of representation engineering
(RepE), an approach to enhancing the transparency of AI systems that draws on insights …
(RepE), an approach to enhancing the transparency of AI systems that draws on insights …
Does circuit analysis interpretability scale? evidence from multiple choice capabilities in chinchilla
\emph {Circuit analysis} is a promising technique for understanding the internal mechanisms
of language models. However, existing analyses are done in small models far from the state …
of language models. However, existing analyses are done in small models far from the state …
Function vectors in large language models
We report the presence of a simple neural mechanism that represents an input-output
function as a vector within autoregressive transformer language models (LMs). Using causal …
function as a vector within autoregressive transformer language models (LMs). Using causal …
Linearity of relation decoding in transformer language models
Much of the knowledge encoded in transformer language models (LMs) may be expressed
in terms of relations: relations between words and their synonyms, entities and their …
in terms of relations: relations between words and their synonyms, entities and their …
Rethinking interpretability in the era of large language models
Interpretable machine learning has exploded as an area of interest over the last decade,
sparked by the rise of increasingly large datasets and deep neural networks …
sparked by the rise of increasingly large datasets and deep neural networks …