Post-hoc interpretability for neural nlp: A survey
Neural networks for NLP are becoming increasingly complex and widespread, and there is a
growing concern if these models are responsible to use. Explaining models helps to address …
growing concern if these models are responsible to use. Explaining models helps to address …
Probing classifiers: Promises, shortcomings, and advances
Y Belinkov - Computational Linguistics, 2022 - direct.mit.edu
Probing classifiers have emerged as one of the prominent methodologies for interpreting
and analyzing deep neural network models of natural language processing. The basic idea …
and analyzing deep neural network models of natural language processing. The basic idea …
Bloom: A 176b-parameter open-access multilingual language model
Large language models (LLMs) have been shown to be able to perform new tasks based on
a few demonstrations or natural language instructions. While these capabilities have led to …
a few demonstrations or natural language instructions. While these capabilities have led to …
Locating and editing factual associations in GPT
We analyze the storage and recall of factual associations in autoregressive transformer
language models, finding evidence that these associations correspond to localized, directly …
language models, finding evidence that these associations correspond to localized, directly …
On the opportunities and risks of foundation models
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …
Do vision transformers see like convolutional neural networks?
Convolutional neural networks (CNNs) have so far been the de-facto model for visual data.
Recent work has shown that (Vision) Transformer models (ViT) can achieve comparable or …
Recent work has shown that (Vision) Transformer models (ViT) can achieve comparable or …
Fast model editing at scale
While large pre-trained models have enabled impressive results on a variety of downstream
tasks, the largest existing models still make errors, and even accurate predictions may …
tasks, the largest existing models still make errors, and even accurate predictions may …
Interpretability at scale: Identifying causal mechanisms in alpaca
Obtaining human-interpretable explanations of large, general-purpose language models is
an urgent goal for AI safety. However, it is just as important that our interpretability methods …
an urgent goal for AI safety. However, it is just as important that our interpretability methods …
[HTML][HTML] Pre-trained models: Past, present and future
Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved
great success and become a milestone in the field of artificial intelligence (AI). Owing to …
great success and become a milestone in the field of artificial intelligence (AI). Owing to …
Physics of language models: Part 3.1, knowledge storage and extraction
Large language models (LLMs) can store a vast amount of world knowledge, often
extractable via question-answering (eg," What is Abraham Lincoln's birthday?"). However …
extractable via question-answering (eg," What is Abraham Lincoln's birthday?"). However …