Sledovat
Joe Benton
Joe Benton
Anthropic
E-mailová adresa ověřena na: anthropic.com - Domovská stránka
Název
Citace
Citace
Rok
A Continuous Time Framework for Discrete Denoising Models
A Campbell, J Benton, V De Bortoli, T Rainforth, G Deligiannidis, ...
Advances in Neural Information Processing Systems 35, 28266-28279, 2022
1332022
Nearly -Linear Convergence Bounds for Diffusion Models via Stochastic Localization
J Benton, V De Bortoli, A Doucet, G Deligiannidis
International Conference on Learning Representations, 2024
130*2024
Many-shot jailbreaking
C Anil, E Durmus, N Rimsky, M Sharma, J Benton, S Kundu, J Batson, ...
The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
822024
Error Bounds for Flow Matching Methods
J Benton, G Deligiannidis, A Doucet
Transactions on Machine Learning Research, 2024
482024
From Denoising Diffusions to Denoising Markov Models
J Benton, Y Shi, V De Bortoli, G Deligiannidis, A Doucet
Journal of the Royal Statistical Society Series B: Statistical Methodology …, 2024
312024
Polysemanticity and Capacity in Neural Networks
A Scherlis, K Sachan, AS Jermyn, J Benton, B Shlegeris
arXiv preprint arXiv:2210.01892, 2022
282022
Alpha-divergence Variational Inference Meets Importance Weighted Auto-Encoders: Methodology and Asymptotics
K Daudel, J Benton, Y Shi, A Doucet
Journal of Machine Learning Research 24 (243), 1-83, 2023
102023
When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?
R Schaeffer, D Valentine, L Bailey, J Chua, C Eyzaguirre, Z Durante, ...
arXiv preprint arXiv:2407.15211, 2024
62024
Sabotage Evaluations for Frontier Models
J Benton, M Wagner, E Christiansen, C Anil, E Perez, J Srivastav, ...
arXiv preprint arXiv:2410.21514, 2024
52024
Measuring Feature Sparsity in Language Models
M Deng, L Tao, J Benton
NeurIPS 2023 Workshop on Socially Responsible Language Modelling Research, 2023
12023
Systém momentálně nemůže danou operaci provést. Zkuste to znovu později.
Články 1–10