An overview of catastrophic AI risks D Hendrycks, M Mazeika, T Woodside arXiv preprint arXiv:2306.12001, 2023 | 235 | 2023 |
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark A Pan, CJ Shern, A Zou, N Li, S Basart, T Woodside, J Ng, H Zhang, ... International Conference on Machine Learning, 2023 | 132 | 2023 |
Artificial influence: An analysis of AI-driven persuasion M Burtell*, T Woodside* arXiv preprint arXiv:2303.08721, 2023 | 48 | 2023 |
MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding SH Wang, A Scardigli, L Tang, W Chen, D Levkin, A Chen, S Ball, ... Empirical Methods in Natural Language Processing, 2023 | 22 | 2023 |
Responsible Reporting for Frontier AI Development N Kolt, M Anderljung, J Barnhart, A Brass, K Esvelt, GK Hadfield, L Heim, ... Artificial Intelligence, Ethics, & Society 2024, 2024 | 15 | 2024 |
Examples of AI improving AI T Woodside Retrieved September, 2023 | 4 | 2023 |
Investigating Trojan Attacks In Large Language Models T Woodside, M Mazeika, D Radev, D Hendrycks | 2 | 2024 |
Through the Chat Window and Into the Real World C Painter, C O'Keefe, I Gabriel, K Fisher, K Ramakrishnan, K Jackson, ... | | 2024 |