Bertie Vidgen

Citado por

	Total	Desde 2020
Citas	4040	3945
Índice h	29	29
Índice i10	41	41

1800

900

450

1350

2018201920202021202220232024202519 59 113 340 568 913 1737 261

Acceso público

Ver todo

16 artículos

1 artículo

disponibles

no disponibles

Basado en requisitos de financiación

Coautores

Taha YasseriWorkday Professor and Chair of Technology and Society, Trinity College DublinDirección de correo verificada de tcd.ie

Seguir

Bertie Vidgen

Oxford, Turing

Dirección de correo verificada de rewire.online

LLMs AI Product AI Safety Alignment Content Moderation


Título Ordenar por citas Ordenar por año Ordenar por título	Citado por Citado por	Año
Dynabench: Rethinking benchmarking in NLP D Kiela, M Bartolo, Y Nie, D Kaushik, A Geiger, Z Wu, B Vidgen, G Prasad, ... arXiv preprint arXiv:2104.14337, 2021	427	2021
Directions in abusive language training data, a systematic review: Garbage in, garbage out B Vidgen, L Derczynski Plos one 15 (12), e0243300, 2020	353	2020
HateCheck: Functional tests for hate speech detection models P Röttger, B Vidgen, D Nguyen, Z Waseem, H Margetts, JB Pierrehumbert arXiv preprint arXiv:2012.15606, 2020	271	2020
Learning from the worst: Dynamically generated datasets to improve online hate detection B Vidgen, T Thrush, Z Waseem, D Kiela arXiv preprint arXiv:2012.15761, 2020	262	2020
Trustllm: Trustworthiness in large language models L Sun, Y Huang, H Wang, S Wu, Q Zhang, C Gao, Y Huang, W Lyu, ... arXiv preprint arXiv:2401.05561 3, 2024	255	2024
Challenges and frontiers in abusive content detection B Vidgen, A Harris, D Nguyen, R Tromble, S Hale, H Margetts Proceedings of the third workshop on abusive language online, 2019	234	2019
Detecting weak and strong Islamophobic hate speech on social media B Vidgen, T Yasseri Journal of Information Technology & Politics 17 (1), 66-78, 2020	208	2020
Two contrasting data annotation paradigms for subjective NLP tasks P Röttger, B Vidgen, D Hovy, JB Pierrehumbert arXiv preprint arXiv:2112.07475, 2021	164	2021
P-Values: Misunderstood and Misused B Vidgen, T Yasseri Frontiers in Physics 4, 6, 2016	158	2016
Xstest: A test suite for identifying exaggerated safety behaviours in large language models P Röttger, HR Kirk, B Vidgen, G Attanasio, F Bianchi, D Hovy arXiv preprint arXiv:2308.01263, 2023	139	2023
Semeval-2023 task 10: Explainable detection of online sexism HR Kirk, W Yin, B Vidgen, P Röttger arXiv preprint arXiv:2303.04222, 2023	137	2023
An expert annotated dataset for the detection of online misogyny E Guest, B Vidgen, A Mittos, N Sastry, G Tyson, H Margetts Proceedings of the 16th conference of the European chapter of the …, 2021	118	2021
Detecting East Asian prejudice on social media B Vidgen, A Botelho, D Broniatowski, E Guest, M Hall, H Margetts, ... arXiv preprint arXiv:2005.03909, 2020	113	2020
Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback HR Kirk, B Vidgen, P Röttger, SA Hale arXiv preprint arXiv:2303.05453, 2023	102	2023
Introducing CAD: the contextual abuse dataset B Vidgen, D Nguyen, H Margetts, P Rossini, R Tromble	102	2021
The benefits, risks and bounds of personalizing the alignment of large language models to individuals HR Kirk, B Vidgen, P Röttger, SA Hale Nature Machine Intelligence 6 (4), 383-392, 2024	82	2024
The prism alignment project: What participatory, representative and individualised human feedback reveals about the subjective and multicultural alignment of large language models HR Kirk, A Whitefield, P Röttger, A Bean, K Margatina, J Ciro, R Mosquera, ... arXiv preprint arXiv:2404.16019, 2024	65	2024
Financebench: A new benchmark for financial question answering P Islam, A Kannappan, D Kiela, R Qian, N Scherrer, B Vidgen arXiv preprint arXiv:2311.11944, 2023	64	2023
Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate HR Kirk, B Vidgen, P Röttger, T Thrush, SA Hale arXiv preprint arXiv:2108.05921, 2021	63	2021
Multilingual HateCheck: Functional tests for multilingual hate speech detection models P Röttger, H Seelawi, D Nozza, Z Talat, B Vidgen arXiv preprint arXiv:2206.09917, 2022	61	2022

El sistema no puede realizar la operación en estos momentos. Inténtalo de nuevo más tarde.

Artículos 1–20

Citas por año

Citas duplicadas

Citas combinadas

Añadir coautoresCoautores

Seguir

Citado por

Coautores