Sledovat
Erik Jenner
Erik Jenner
E-mailová adresa ověřena na: berkeley.edu - Domovská stránka
Název
Citace
Citace
Rok
Foundational challenges in assuring alignment and safety of large language models
U Anwar, A Saparov, J Rando, D Paleka, M Turpin, P Hase, ES Lubana, ...
TMLR, 2024
137*2024
imitation: Clean imitation learning implementations
A Gleave, M Taufeeque, J Rocamonde, E Jenner, SH Wang, S Toyer, ...
arXiv preprint arXiv:2211.11972, 2022
612022
Steerable Partial Differential Operators for Equivariant Neural Networks
E Jenner, M Weiler
ICLR, 2022
312022
Preprocessing Reward Functions for Interpretability
E Jenner, A Gleave
NeurIPS Cooperative AI workshop, 2021
122021
Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
E Jenner, S Kapur, V Georgiev, C Allen, S Emmons, S Russell
NeurIPS, 2024
72024
When Your AI Deceives You: Challenges with Partial Observability of Human Evaluators in Reward Learning
L Lang, D Foote, S Russell, A Dragan, E Jenner, S Emmons
NeurIPS, 2024
7*2024
STARC: A General Framework For Quantifying Differences Between Reward Functions
J Skalse, L Farnik, SR Motwani, E Jenner, A Gleave, A Abate
ICLR, 2023
72023
Calculus on MDPs: Potential Shaping as a Gradient
E Jenner, H van Hoof, A Gleave
arXiv preprint arXiv:2208.09570, 2022
7*2022
A comparison of causal scrubbing, causal abstractions, and related methods
E Jenner, A Garriga-alonso, E Zverev
AI Alignment Forum, 2023
42023
A general framework for reward function distances
E Jenner, JMV Skalse, A Gleave
NeurIPS ML Safety Workshop, 2022
42022
Diffusion On Syntax Trees For Program Synthesis
S Kapur, E Jenner, S Russell
arXiv preprint arXiv:2405.20519, 2024
22024
Obfuscated Activations Bypass LLM Latent-Space Defenses
L Bailey, A Serrano, A Sheshadri, M Seleznyov, J Taylor, E Jenner, ...
arXiv preprint arXiv:2412.09565, 2024
2024
Extensions of Karger's Algorithm: Why They Fail in Theory and How They Are Useful in Practice
E Jenner, EF Sanmartín, FA Hamprecht
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021
2021
AI Can Conceal Undesirable Outputs Even Under White-Box Inspection
A Draguns, E Jenner
Replication: Fairness without demographics through Adversarially Reweighted Learning
E Jenner, T Lieberum, FP Nolte, N Rutsch
Systém momentálně nemůže danou operaci provést. Zkuste to znovu později.
Články 1–15