Follow
Wes Gurnee
Wes Gurnee
Anthropic
Verified email at mit.edu - Homepage
Title
Cited by
Cited by
Year
Language models represent space and time
W Gurnee, M Tegmark
ICLR 2024, 2023
1682023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
W Gurnee, N Nanda, M Pauly, K Harvey, D Troitskii, D Bertsimas
Transactions of Machine Learning Research (TMLR), 2023
1322023
Refusal in language models is mediated by a single direction
A Arditi, O Obeso, A Syed, D Paleka, N Panickssery, W Gurnee, N Nanda
arXiv preprint arXiv:2406.11717, 2024
68*2024
Learning sparse nonlinear dynamics via mixed-integer optimization
D Bertsimas, W Gurnee
Nonlinear Dynamics 111 (7), 6585-6604, 2023
452023
Fairmandering: A column generation heuristic for fairness-optimized political districting
W Gurnee, DB Shmoys
SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21), 88-99, 2021
422021
Not all language model features are linear
J Engels, EJ Michaud, I Liao, W Gurnee, M Tegmark
arXiv preprint arXiv:2405.14860, 2024
35*2024
Universal neurons in GPT2 language models
W Gurnee, T Horsley, ZC Guo, TR Kheirkhah, Q Sun, W Hathaway, ...
Transactions of Machine Learning Research (TMLR), 2024
22*2024
The Remarkable Robustness of LLMs: Stages of Inference?
V Lad, W Gurnee, M Tegmark
arXiv preprint arXiv:2406.19384, 2024
19*2024
Combatting gerrymandering with social choice: The design of multi-member districts
N Garg, W Gurnee, D Rothschild, D Shmoys
Proceedings of the 23rd ACM Conference on Economics and Computation, 560-561, 2022
142022
Sae reconstruction errors are (empirically) pathological
W Gurnee
AI Alignment Forum, 16, 2024
8*2024
Confidence regulation neurons in language models
A Stolfo, B Wu, W Gurnee, Y Belinkov, X Song, M Sachan, N Nanda
arXiv preprint arXiv:2406.16254, 2024
6*2024
Language models represent space and time. arXiv
W Gurnee, M Tegmark
arXiv preprint arXiv:2310.02207, 2024
52024
Language models represent space and time (arXiv: 2310.02207). arXiv
W Gurnee, M Tegmark
52023
Training Dynamics of Contextual N-Grams in Language Models
L Quirke, L Heindrich, W Gurnee, N Nanda
NeurIPS 2023 Workshop on Attributing Model Behavior at Scale, 2023
42023
Multilevel interpretability of artificial neural networks: leveraging framework and methods from neuroscience
Z He, J Achterberg, K Collins, K Nejad, D Akarca, Y Yang, W Gurnee, ...
arXiv preprint arXiv:2408.12664, 2024
12024
Scalable approximations of capacitated k-medians for political districting
W Gurnee
Technical report, Cornell University, Ithaca, United States, 2020
12020
The system can't perform the operation now. Try again later.
Articles 1–16