Follow
Thomas Woodside
Thomas Woodside
Center for AI Safety Action Fund
Verified email at safe.ai - Homepage
Title
Cited by
Cited by
Year
An overview of catastrophic AI risks
D Hendrycks, M Mazeika, T Woodside
arXiv preprint arXiv:2306.12001, 2023
2352023
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
A Pan, CJ Shern, A Zou, N Li, S Basart, T Woodside, J Ng, H Zhang, ...
International Conference on Machine Learning, 2023
1322023
Artificial influence: An analysis of AI-driven persuasion
M Burtell*, T Woodside*
arXiv preprint arXiv:2303.08721, 2023
482023
MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding
SH Wang, A Scardigli, L Tang, W Chen, D Levkin, A Chen, S Ball, ...
Empirical Methods in Natural Language Processing, 2023
222023
Responsible Reporting for Frontier AI Development
N Kolt, M Anderljung, J Barnhart, A Brass, K Esvelt, GK Hadfield, L Heim, ...
Artificial Intelligence, Ethics, & Society 2024, 2024
152024
Examples of AI improving AI
T Woodside
Retrieved September, 2023
42023
Investigating Trojan Attacks In Large Language Models
T Woodside, M Mazeika, D Radev, D Hendrycks
22024
Through the Chat Window and Into the Real World
C Painter, C O'Keefe, I Gabriel, K Fisher, K Ramakrishnan, K Jackson, ...
2024
The system can't perform the operation now. Try again later.
Articles 1–8