Challenges and applications of large language models

J Kaddour, J Harris, M Mozes, H Bradley… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arxiv preprint arxiv …, 2023‏ - arxiv.org
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

Pythia: A suite for analyzing large language models across training and scaling

S Biderman, H Schoelkopf… - International …, 2023‏ - proceedings.mlr.press
How do large language models (LLMs) develop and evolve over the course of training?
How do these patterns change as models scale? To answer these questions, we introduce …

Emergent and predictable memorization in large language models

S Biderman, U Prashanth, L Sutawika… - Advances in …, 2024‏ - proceedings.neurips.cc
Memorization, or the tendency of large language models (LLMs) to output entire sequences
from their training data verbatim, is a key concern for deploying language models. In …

Llemma: An open language model for mathematics

Z Azerbayev, H Schoelkopf, K Paster… - arxiv preprint arxiv …, 2023‏ - arxiv.org
We present Llemma, a large language model for mathematics. We continue pretraining
Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing …

Representation engineering: A top-down approach to ai transparency

A Zou, L Phan, S Chen, J Campbell, P Guo… - arxiv preprint arxiv …, 2023‏ - arxiv.org
In this paper, we identify and characterize the emerging area of representation engineering
(RepE), an approach to enhancing the transparency of AI systems that draws on insights …

Does circuit analysis interpretability scale? evidence from multiple choice capabilities in chinchilla

T Lieberum, M Rahtz, J Kramár, N Nanda… - arxiv preprint arxiv …, 2023‏ - arxiv.org
\emph {Circuit analysis} is a promising technique for understanding the internal mechanisms
of language models. However, existing analyses are done in small models far from the state …

Function vectors in large language models

E Todd, ML Li, AS Sharma, A Mueller… - arxiv preprint arxiv …, 2023‏ - arxiv.org
We report the presence of a simple neural mechanism that represents an input-output
function as a vector within autoregressive transformer language models (LMs). Using causal …

Linearity of relation decoding in transformer language models

E Hernandez, AS Sharma, T Haklay, K Meng… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Much of the knowledge encoded in transformer language models (LMs) may be expressed
in terms of relations: relations between words and their synonyms, entities and their …

Rethinking interpretability in the era of large language models

C Singh, JP Inala, M Galley, R Caruana… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Interpretable machine learning has exploded as an area of interest over the last decade,
sparked by the rise of increasingly large datasets and deep neural networks …