Wes Gurnee

Cited by

	All	Since 2020
Citations	575	575
h-index	9	9
i10-index	9	9

460

230

115

345

202220232024202513 79 441 39

Public access

View all

1 article

0 articles

available

not available

Based on funding mandates

Co-authors

Neel NandaMechanistic Interpretability Team Lead, Google DeepMindVerified email at deepmind.com
Dimitris BertsimasBoeing Professor of Operations Research, MITVerified email at mit.edu
Max TegmarkProfessor of Physics, MITVerified email at mit.edu
Andy Arditi
Nina PanicksseryAnthropicVerified email at anthropic.com
David ShmoysProfessor of Operations Research & Information Engineering and of Computer ScienceVerified email at cs.cornell.edu
Matthew PaulyUndergraduate Student, Harvard UniversityVerified email at college.harvard.edu
Isaac LiaoCarnegie Mellon UniversityVerified email at andrew.cmu.edu
Josh EngelsPhD Student, MITVerified email at mit.edu
Zifan Carl GuoMITVerified email at mit.edu
Eric J. MichaudGraduate student, MITVerified email at mit.edu
Nikhil GargAssistant Professor, Cornell TechVerified email at cornell.edu
David RothschildMicrosoft ResearchVerified email at researchdmr.com
Lovis HeindrichMax Planck Institute for Intelligent SystemsVerified email at tuebingen.mpg.de

Wes Gurnee

Anthropic

Verified email at mit.edu - Homepage

Mechanistic Interpretability AI Alignment Optimization Governance


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Language models represent space and time W Gurnee, M Tegmark ICLR 2024, 2023	168	2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing W Gurnee, N Nanda, M Pauly, K Harvey, D Troitskii, D Bertsimas Transactions of Machine Learning Research (TMLR), 2023	132	2023
Refusal in language models is mediated by a single direction A Arditi, O Obeso, A Syed, D Paleka, N Panickssery, W Gurnee, N Nanda arXiv preprint arXiv:2406.11717, 2024	68*	2024
Learning sparse nonlinear dynamics via mixed-integer optimization D Bertsimas, W Gurnee Nonlinear Dynamics 111 (7), 6585-6604, 2023	45	2023
Fairmandering: A column generation heuristic for fairness-optimized political districting W Gurnee, DB Shmoys SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21), 88-99, 2021	42	2021
Not all language model features are linear J Engels, EJ Michaud, I Liao, W Gurnee, M Tegmark arXiv preprint arXiv:2405.14860, 2024	35*	2024
Universal neurons in GPT2 language models W Gurnee, T Horsley, ZC Guo, TR Kheirkhah, Q Sun, W Hathaway, ... Transactions of Machine Learning Research (TMLR), 2024	22*	2024
The Remarkable Robustness of LLMs: Stages of Inference? V Lad, W Gurnee, M Tegmark arXiv preprint arXiv:2406.19384, 2024	19*	2024
Combatting gerrymandering with social choice: The design of multi-member districts N Garg, W Gurnee, D Rothschild, D Shmoys Proceedings of the 23rd ACM Conference on Economics and Computation, 560-561, 2022	14	2022
Sae reconstruction errors are (empirically) pathological W Gurnee AI Alignment Forum, 16, 2024	8*	2024
Confidence regulation neurons in language models A Stolfo, B Wu, W Gurnee, Y Belinkov, X Song, M Sachan, N Nanda arXiv preprint arXiv:2406.16254, 2024	6*	2024
Language models represent space and time. arXiv W Gurnee, M Tegmark arXiv preprint arXiv:2310.02207, 2024	5	2024
Language models represent space and time (arXiv: 2310.02207). arXiv W Gurnee, M Tegmark	5	2023
Training Dynamics of Contextual N-Grams in Language Models L Quirke, L Heindrich, W Gurnee, N Nanda NeurIPS 2023 Workshop on Attributing Model Behavior at Scale, 2023	4	2023
Multilevel interpretability of artificial neural networks: leveraging framework and methods from neuroscience Z He, J Achterberg, K Collins, K Nejad, D Akarca, Y Yang, W Gurnee, ... arXiv preprint arXiv:2408.12664, 2024	1	2024
Scalable approximations of capacitated k-medians for political districting W Gurnee Technical report, Cornell University, Ithaca, United States, 2020	1	2020

The system can't perform the operation now. Try again later.

Articles 1–16

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors