متابعة
Wes Gurnee
Wes Gurnee
Anthropic
بريد إلكتروني تم التحقق منه على mit.edu - الصفحة الرئيسية
عنوان
عدد مرات الاقتباسات
عدد مرات الاقتباسات
السنة
Language models represent space and time
W Gurnee, M Tegmark
ICLR 2024, 2023
198*2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
W Gurnee, N Nanda, M Pauly, K Harvey, D Troitskii, D Bertsimas
Transactions of Machine Learning Research (TMLR), 2023
146*2023
Refusal in language models is mediated by a single direction
A Arditi, O Obeso, A Syed, D Paleka, N Panickssery, W Gurnee, N Nanda
arXiv preprint arXiv:2406.11717, 2024
90*2024
Learning sparse nonlinear dynamics via mixed-integer optimization
D Bertsimas, W Gurnee
Nonlinear Dynamics 111 (7), 6585-6604, 2023
472023
Not all language model features are linear
J Engels, EJ Michaud, I Liao, W Gurnee, M Tegmark
arXiv preprint arXiv:2405.14860, 2024
44*2024
Fairmandering: A column generation heuristic for fairness-optimized political districting
W Gurnee, DB Shmoys
SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21), 88-99, 2021
422021
Universal neurons in GPT2 language models
W Gurnee, T Horsley, ZC Guo, TR Kheirkhah, Q Sun, W Hathaway, ...
Transactions of Machine Learning Research (TMLR), 2024
28*2024
The Remarkable Robustness of LLMs: Stages of Inference?
V Lad, W Gurnee, M Tegmark
arXiv preprint arXiv:2406.19384, 2024
21*2024
Combatting gerrymandering with social choice: The design of multi-member districts
N Garg, W Gurnee, D Rothschild, D Shmoys
Proceedings of the 23rd ACM Conference on Economics and Computation, 560-561, 2022
152022
Confidence regulation neurons in language models
A Stolfo, B Wu, W Gurnee, Y Belinkov, X Song, M Sachan, N Nanda
Advances in Neural Information Processing Systems 37, 125019-125049, 2025
10*2025
Sae reconstruction errors are (empirically) pathological
W Gurnee
AI Alignment Forum, 16, 2024
9*2024
Training Dynamics of Contextual N-Grams in Language Models
L Quirke, L Heindrich, W Gurnee, N Nanda
NeurIPS 2023 Workshop on Attributing Model Behavior at Scale, 2023
42023
Multilevel interpretability of artificial neural networks: leveraging framework and methods from neuroscience
Z He, J Achterberg, K Collins, K Nejad, D Akarca, Y Yang, W Gurnee, ...
arXiv preprint arXiv:2408.12664, 2024
12024
Scalable approximations of capacitated k-medians for political districting
W Gurnee
Technical report, Cornell University, Ithaca, United States, 2020
12020
يتعذر على النظام إجراء العملية في الوقت الحالي. عاود المحاولة لاحقًا.
مقالات 1–14