Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, JB Alayrac, J Yu, R Soricut, J Schalkwyk, ... arXiv preprint arXiv:2312.11805, 2023 | 2510 | 2023 |
Scaling language models: Methods, analysis & insights from training gopher JW Rae, S Borgeaud, T Cai, K Millican, J Hoffmann, F Song, J Aslanides, ... arXiv preprint arXiv:2112.11446, 2021 | 1149 | 2021 |
Ethical and social risks of harm from language models L Weidinger, J Mellor, M Rauh, C Griffin, J Uesato, PS Huang, M Cheng, ... arXiv preprint arXiv:2112.04359, 2021 | 1056 | 2021 |
Taxonomy of risks posed by language models L Weidinger, J Uesato, M Rauh, C Griffin, PS Huang, J Mellor, A Glaese, ... Proceedings of the 2022 ACM Conference on Fairness, Accountability, and …, 2022 | 617 | 2022 |
Improving alignment of dialogue agents via targeted human judgements A Glaese, N McAleese, M Trębacz, J Aslanides, V Firoiu, T Ewalds, ... arXiv preprint arXiv:2209.14375, 2022 | 470 | 2022 |
Alignment of language agents Z Kenton, T Everitt, L Weidinger, I Gabriel, V Mikulik, G Irving arXiv preprint arXiv:2103.14659, 2021 | 165 | 2021 |
Sociotechnical safety evaluation of generative AI systems L Weidinger, M Rauh, N Marchal, A Manzini, LA Hendricks, ... arXiv preprint arXiv:2310.11986, 2023 | 136 | 2023 |
Spurious normativity enhances learning of compliance and enforcement behavior in artificial agents R Köster, D Hadfield-Menell, R Everett, L Weidinger, GK Hadfield, ... Proceedings of the National Academy of Sciences 119 (3), e2106028118, 2022 | 71* | 2022 |
The ethics of advanced AI assistants I Gabriel, A Manzini, G Keeling, LA Hendricks, V Rieser, H Iqbal, ... arXiv preprint arXiv:2404.16244, 2024 | 52 | 2024 |
Characteristics of harmful text: Towards rigorous benchmarking of language models M Rauh, J Mellor, J Uesato, PS Huang, J Welbl, L Weidinger, S Dathathri, ... Advances in Neural Information Processing Systems 35, 24720-24739, 2022 | 48 | 2022 |
Using the Veil of Ignorance to align AI systems with principles of justice L Weidinger, KR McKee, R Everett, S Huang, TO Zhu, MJ Chadwick, ... Proceedings of the National Academy of Sciences 120 (18), e2213709120, 2023 | 36 | 2023 |
Social conformity in autism SC Lazzaro, L Weidinger, RA Cooper, S Baron-Cohen, C Moutsiana, ... Journal of Autism and Developmental Disorders 49, 1304-1315, 2019 | 28 | 2019 |
Towards responsible development of generative AI for education: An evaluation-driven approach I Jurenka, M Kunesch, KR McKee, D Gillick, S Zhu, S Wiltberger, SM Phal, ... arXiv preprint arXiv:2407.12687, 2024 | 26 | 2024 |
Test–retest reliability of reinforcement learning parameters JV Schaaf, L Weidinger, L Molleman, W van den Bos Behavior Research Methods 56 (5), 4582-4599, 2024 | 25* | 2024 |
Accounting for offensive speech as a practice of resistance M Díaz, R Amironesei, L Weidinger, I Gabriel Proceedings of the sixth workshop on online abuse and harms (woah), 192-202, 2022 | 19 | 2022 |
Holistic safety and responsibility evaluations of advanced AI models L Weidinger, J Barnhart, J Brennan, C Butterfield, S Young, W Hawkins, ... arXiv preprint arXiv:2404.14068, 2024 | 15 | 2024 |
Improving alignment of dialogue agents via targeted human judgements, 2022 A Glaese, N McAleese, M Trebacz, J Aslanides, V Firoiu, T Ewalds, ... URL https://storage. googleapis. com/deepmind-media/DeepMind. com/Authors …, 2022 | 11 | 2022 |
STAR: SocioTechnical Approach to Red Teaming Language Models L Weidinger, J Mellor, BG Pegueroles, N Marchal, R Kumar, K Lum, ... arXiv preprint arXiv:2406.11757, 2024 | 9 | 2024 |
Language modelling at scale: Gopher, ethical considerations, and retrieval J Rae, G Irving, L Weidinger DeepMind Blog, 2021 | 9 | 2021 |
Operationalizing contextual integrity in privacy-conscious assistants S Ghalebikesabi, E Bagdasaryan, R Yi, I Yona, I Shumailov, A Pappu, ... arXiv preprint arXiv:2408.02373, 2024 | 6 | 2024 |