Probing classifiers: Promises, shortcomings, and advances

Y Belinkov - Computational Linguistics, 2022 - direct.mit.edu
Probing classifiers have emerged as one of the prominent methodologies for interpreting
and analyzing deep neural network models of natural language processing. The basic idea …

Post-hoc interpretability for neural nlp: A survey

A Madsen, S Reddy, S Chandar - ACM Computing Surveys, 2022 - dl.acm.org
Neural networks for NLP are becoming increasingly complex and widespread, and there is a
growing concern if these models are responsible to use. Explaining models helps to address …

Bloom: A 176b-parameter open-access multilingual language model

T Le Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow… - 2023 - inria.hal.science
Large language models (LLMs) have been shown to be able to perform new tasks based on
a few demonstrations or natural language instructions. While these capabilities have led to …

Emergent world representations: Exploring a sequence model trained on a synthetic task

K Li, AK Hopkins, D Bau, F Viégas, H Pfister… - ICLR, 2023 - par.nsf.gov
Language models show a surprising range of capabilities, but the source of their apparent
competence is unclear. Do these networks just memorize a collection of surface statistics, or …

Locating and editing factual associations in gpt

K Meng, D Bau, A Andonian… - Advances in neural …, 2022 - proceedings.neurips.cc
We analyze the storage and recall of factual associations in autoregressive transformer
language models, finding evidence that these associations correspond to localized, directly …

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arxiv preprint arxiv …, 2021 - arxiv.org
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Do vision transformers see like convolutional neural networks?

M Raghu, T Unterthiner, S Kornblith… - Advances in neural …, 2021 - proceedings.neurips.cc
Convolutional neural networks (CNNs) have so far been the de-facto model for visual data.
Recent work has shown that (Vision) Transformer models (ViT) can achieve comparable or …

Black-box access is insufficient for rigorous ai audits

S Casper, C Ezell, C Siegmann, N Kolt… - Proceedings of the …, 2024 - dl.acm.org
External audits of AI systems are increasingly recognized as a key mechanism for AI
governance. The effectiveness of an audit, however, depends on the degree of access …

Fast model editing at scale

E Mitchell, C Lin, A Bosselut, C Finn… - arxiv preprint arxiv …, 2021 - arxiv.org
While large pre-trained models have enabled impressive results on a variety of downstream
tasks, the largest existing models still make errors, and even accurate predictions may …

Rethinking interpretability in the era of large language models

C Singh, JP Inala, M Galley, R Caruana… - arxiv preprint arxiv …, 2024 - arxiv.org
Interpretable machine learning has exploded as an area of interest over the last decade,
sparked by the rise of increasingly large datasets and deep neural networks …