Dynabench: Rethinking benchmarking in NLP D Kiela, M Bartolo, Y Nie, D Kaushik, A Geiger, Z Wu, B Vidgen, G Prasad, ... arXiv preprint arXiv:2104.14337, 2021 | 427 | 2021 |
Directions in abusive language training data, a systematic review: Garbage in, garbage out B Vidgen, L Derczynski Plos one 15 (12), e0243300, 2020 | 353 | 2020 |
HateCheck: Functional tests for hate speech detection models P Röttger, B Vidgen, D Nguyen, Z Waseem, H Margetts, JB Pierrehumbert arXiv preprint arXiv:2012.15606, 2020 | 271 | 2020 |
Learning from the worst: Dynamically generated datasets to improve online hate detection B Vidgen, T Thrush, Z Waseem, D Kiela arXiv preprint arXiv:2012.15761, 2020 | 262 | 2020 |
Trustllm: Trustworthiness in large language models L Sun, Y Huang, H Wang, S Wu, Q Zhang, C Gao, Y Huang, W Lyu, ... arXiv preprint arXiv:2401.05561 3, 2024 | 255 | 2024 |
Challenges and frontiers in abusive content detection B Vidgen, A Harris, D Nguyen, R Tromble, S Hale, H Margetts Proceedings of the third workshop on abusive language online, 2019 | 234 | 2019 |
Detecting weak and strong Islamophobic hate speech on social media B Vidgen, T Yasseri Journal of Information Technology & Politics 17 (1), 66-78, 2020 | 208 | 2020 |
Two contrasting data annotation paradigms for subjective NLP tasks P Röttger, B Vidgen, D Hovy, JB Pierrehumbert arXiv preprint arXiv:2112.07475, 2021 | 164 | 2021 |
P-Values: Misunderstood and Misused B Vidgen, T Yasseri Frontiers in Physics 4, 6, 2016 | 158 | 2016 |
Xstest: A test suite for identifying exaggerated safety behaviours in large language models P Röttger, HR Kirk, B Vidgen, G Attanasio, F Bianchi, D Hovy arXiv preprint arXiv:2308.01263, 2023 | 139 | 2023 |
Semeval-2023 task 10: Explainable detection of online sexism HR Kirk, W Yin, B Vidgen, P Röttger arXiv preprint arXiv:2303.04222, 2023 | 137 | 2023 |
An expert annotated dataset for the detection of online misogyny E Guest, B Vidgen, A Mittos, N Sastry, G Tyson, H Margetts Proceedings of the 16th conference of the European chapter of the …, 2021 | 118 | 2021 |
Detecting East Asian prejudice on social media B Vidgen, A Botelho, D Broniatowski, E Guest, M Hall, H Margetts, ... arXiv preprint arXiv:2005.03909, 2020 | 113 | 2020 |
Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback HR Kirk, B Vidgen, P Röttger, SA Hale arXiv preprint arXiv:2303.05453, 2023 | 102 | 2023 |
Introducing CAD: the contextual abuse dataset B Vidgen, D Nguyen, H Margetts, P Rossini, R Tromble | 102 | 2021 |
The benefits, risks and bounds of personalizing the alignment of large language models to individuals HR Kirk, B Vidgen, P Röttger, SA Hale Nature Machine Intelligence 6 (4), 383-392, 2024 | 82 | 2024 |
The prism alignment project: What participatory, representative and individualised human feedback reveals about the subjective and multicultural alignment of large language models HR Kirk, A Whitefield, P Röttger, A Bean, K Margatina, J Ciro, R Mosquera, ... arXiv preprint arXiv:2404.16019, 2024 | 65 | 2024 |
Financebench: A new benchmark for financial question answering P Islam, A Kannappan, D Kiela, R Qian, N Scherrer, B Vidgen arXiv preprint arXiv:2311.11944, 2023 | 64 | 2023 |
Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate HR Kirk, B Vidgen, P Röttger, T Thrush, SA Hale arXiv preprint arXiv:2108.05921, 2021 | 63 | 2021 |
Multilingual HateCheck: Functional tests for multilingual hate speech detection models P Röttger, H Seelawi, D Nozza, Z Talat, B Vidgen arXiv preprint arXiv:2206.09917, 2022 | 61 | 2022 |