Sledovat
Saffron Huang
Saffron Huang
Anthropic
E-mailová adresa ověřena na: anthropic.com - Domovská stránka
Název
Citace
Citace
Rok
Scaling language models: Methods, analysis & insights from training gopher
JW Rae, S Borgeaud, T Cai, K Millican, J Hoffmann, F Song, J Aslanides, ...
arXiv preprint arXiv:2112.11446, 2021
11462021
Improving language models by retrieving from trillions of tokens
S Borgeaud, A Mensch, J Hoffmann, T Cai, E Rutherford, K Millican, ...
International conference on machine learning, 2206-2240, 2022
11082022
Red teaming language models with language models
E Perez, S Huang, F Song, T Cai, R Ring, J Aslanides, A Glaese, ...
arXiv preprint arXiv:2202.03286, 2022
6082022
Generative AI and the digital commons
S Huang, D Siddarth
arXiv preprint arXiv:2303.11074, 2023
822023
Using the Veil of Ignorance to align AI systems with principles of justice
L Weidinger, KR McKee, R Everett, S Huang, TO Zhu, MJ Chadwick, ...
Proceedings of the National Academy of Sciences 120 (18), e2213709120, 2023
342023
Collective constitutional ai: Aligning a language model with public input
S Huang, D Siddarth, L Lovitt, TI Liao, E Durmus, A Tamkin, D Ganguli
Proceedings of the 2024 ACM Conference on Fairness, Accountability, and …, 2024
252024
How large language models can reshape collective intelligence
JW Burton, E Lopez-Lopez, S Hechtlinger, Z Rahwan, S Aeschbach, ...
Nature human behaviour 8 (9), 1643-1655, 2024
162024
Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks
L Ibrahim, S Huang, L Ahmad, M Anderljung
arXiv preprint arXiv:2405.10632, 2024
122024
Collective constitutional ai: Aligning a language model with public input
D Ganguli, S Huang, L Lovitt, D Siddarth, E Durmus, T Liao, A Askell, ...
Accessed on February 10, 2024, 2023
102023
Evaluating feature steering: A case study in mitigating social biases, 2024
E Durmus, A Tamkin, J Clark, J Wei, J Marcus, J Batson, K Handa, L Lovitt, ...
URL https://anthropic. com/research/evaluating-feature-steering, 0
6
A Departure from Truth
S Huang
Harvard Political Review, 2016
42016
How will advanced AI systems impact democracy?
C Summerfield, L Argyle, M Bakker, T Collins, E Durmus, T Eloundou, ...
arXiv preprint arXiv:2409.06729, 2024
22024
Clio: Privacy-Preserving Insights into Real-World AI Use
A Tamkin, M McCain, K Handa, E Durmus, L Lovitt, A Rathi, S Huang, ...
arXiv preprint arXiv:2412.13678, 2024
2024
Control and Consciousness of Time
S Huang
2023
Bi-Level Multi-Agent Reinforcement Learning for Intervening in Intertemporal Social Dilemmas
S Huang
Harvard University, 2021
2021
Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations
K Handa, A Tamkin, M McCain, S Huang, E Durmus, S Heck, J Mueller, ...
Systém momentálně nemůže danou operaci provést. Zkuste to znovu později.
Články 1–16