עקוב אחר
Markus Anderljung
Markus Anderljung
Centre for the Governance of AI
כתובת אימייל מאומתת בדומיין governance.ai - דף הבית
כותרת
צוטט על ידי
צוטט על ידי
שנה
Toward trustworthy AI development: mechanisms for supporting verifiable claims
M Brundage, S Avin, J Wang, H Belfield, G Krueger, G Hadfield, H Khlaaf, ...
arXiv preprint arXiv:2004.07213, 2020
4662020
Model evaluation for extreme risks
T Shevlane, S Farquhar, B Garfinkel, M Phuong, J Whittlestone, J Leung, ...
arXiv preprint arXiv:2305.15324, 2023
1732023
Frontier AI regulation: Managing emerging risks to public safety
M Anderljung, J Barnhart, A Korinek, J Leung, C O'Keefe, J Whittlestone, ...
arXiv preprint arXiv:2307.03718, 2023
1482023
Foundational challenges in assuring alignment and safety of large language models
U Anwar, A Saparov, J Rando, D Paleka, M Turpin, P Hase, ES Lubana, ...
arXiv preprint arXiv:2404.09932, 2024
1362024
Ethics and governance of artificial intelligence: Evidence from a survey of machine learning researchers
B Zhang, M Anderljung, L Kahn, N Dreksler, MC Horowitz, A Dafoe
Journal of Artificial Intelligence Research 71, 591–666-591–666, 2021
882021
The Brussels effect and artificial intelligence: How EU regulation will impact the global AI market
C Siegmann, M Anderljung
arXiv preprint arXiv:2208.12645, 2022
722022
Institutionalizing ethics in AI through broader impact requirements
CEA Prunkl, C Ashurst, M Anderljung, H Webb, J Leike, A Dafoe
Nature Machine Intelligence 3 (2), 104-110, 2021
722021
Filling gaps in trustworthy development of AI
NZ Shahar Avin, Haydn Belfield, Miles Brundage, Gretchen Krueger, Jasmine ...
Science 374 (6573), pp. 1327-1329, 2021
532021
Computing power and the governance of artificial intelligence
G Sastry, L Heim, H Belfield, M Anderljung, M Brundage, J Hazell, ...
arXiv preprint arXiv:2402.08797, 2024
502024
Towards best practices in AGI safety and governance: A survey of expert opinion
J Schuett, N Dreksler, M Anderljung, D McCaffary, L Heim, E Bluemke, ...
arXiv preprint arXiv:2305.07153, 2023
502023
Open-sourcing highly capable foundation models: An evaluation of risks, benefits, and alternative methods for pursuing open-source objectives
E Seger, N Dreksler, R Moulange, E Dardaman, J Schuett, K Wei, ...
arXiv preprint arXiv:2311.09227, 2023
452023
Protecting society from AI misuse: when are restrictions on capabilities warranted?
M Anderljung, J Hazell, M von Knebel
AI & SOCIETY, 1-17, 2024
382024
Forecasting AI progress: Evidence from a survey of machine learning researchers
B Zhang, N Dreksler, M Anderljung, L Kahn, C Giattino, A Dafoe, ...
arXiv preprint arXiv:2206.04132, 2022
322022
Open problems in technical ai governance
A Reuel, B Bucknall, S Casper, T Fist, L Soder, O Aarne, L Hammond, ...
arXiv preprint arXiv:2407.14981, 2024
282024
Towards publicly accountable frontier LLMs: Building an external scrutiny ecosystem under the ASPIRE framework
M Anderljung, ET Smith, J O'Brien, L Soder, B Bucknall, E Bluemke, ...
arXiv preprint arXiv:2311.14711, 2023
282023
Visibility into AI agents
A Chan, C Ezell, M Kaufmann, K Wei, L Hammond, H Bradley, E Bluemke, ...
Proceedings of the 2024 ACM Conference on Fairness, Accountability, and …, 2024
232024
Social and governance implications of improved data efficiency
AD Tucker, M Anderljung, A Dafoe
Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 378-384, 2020
192020
Model evaluation for extreme risks, 2023
T Shevlane, S Farquhar, B Garfinkel, M Phuong, J Whittlestone, J Leung, ...
URL https://arxiv. org/abs/2305.15324, 0
17
Responsible reporting for frontier AI development
N Kolt, M Anderljung, J Barnhart, A Brass, K Esvelt, GK Hadfield, L Heim, ...
Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 7, 768-783, 2024
152024
A guide to writing the NeurIPS impact statement
C Ashurst, M Anderljung, C Prunkl, J Leike, Y Gal, T Shevlane, A Dafoe
Centre for the Governance of AI. URL: https://perma. cc/B5R8-2B9V, 2020
142020
המערכת אינה יכולה לבצע את הפעולה כעת. נסה שוב מאוחר יותר.
מאמרים 1–20