Paul Röttger

Cited by

	All	Since 2020
Citations	1540	1539
h-index	17	17
i10-index	21	21

980

490

245

735

2021202220232024202518 119 336 963 98

Public access

View all

2 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Bertie VidgenOxford, TuringVerified email at rewire.online
Hannah Rose KirkUniversity of OxfordVerified email at oii.ox.ac.uk
Dirk HovyBocconi UniversityVerified email at unibocconi.it
Janet B. PierrehumbertProf. of Language Modelling, Univ. of Oxford Dept. of Engineering ScienceVerified email at oerc.ox.ac.uk
Helen MargettsProfessor of Society and the Internet, University of OxfordVerified email at oii.ox.ac.uk
Giuseppe AttanasioPostdoctoral Researcher, Instituto de TelecomunicaçõesVerified email at lx.it.pt
Debora NozzaAssistant Professor, Bocconi UniversityVerified email at unibocconi.it

Paul Röttger

Postdoctoral Researcher, Bocconi University

Verified email at unibocconi.it - Homepage

Natural Language Processing Large Language Models Online Harms AI Safety


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
HateCheck: Functional Tests for Hate Speech Detection Models P Röttger, B Vidgen, D Nguyen, Z Waseem, H Margetts, J Pierrehumbert ACL 2021 (Main) - 🏆 Stanford HAI AI Audit Challenge, 2021	270	2021
The Benefits, Risks and Bounds of Personalizing the Alignment of Large Language Models to Individuals HR Kirk, B Vidgen, P Röttger, SA Hale Nature Machine Intelligence, 2024	172*	2024
Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks P Röttger, B Vidgen, D Hovy, JB Pierrehumbert NAACL 2022 (Main), 2022	164	2022
Safety-Tuned Llamas: Lessons from Improving the Safety of Large Language Models that Follow Instructions F Bianchi, M Suzgun, G Attanasio, P Röttger, D Jurafsky, T Hashimoto, ... ICLR 2024 (Poster), 2023	135	2023
SemEval-2023 Task 10: Explainable Detection of Online Sexism HR Kirk, W Yin, B Vidgen, P Röttger ACL 2023 (Main) - 🏆 Best Task Paper, 2023	135	2023
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models P Röttger, HR Kirk, B Vidgen, G Attanasio, F Bianchi, D Hovy NAACL 2024 (Main), 2023	121	2023
Temporal Adaptation of BERT and Performance on Downstream Document Classification: Insights from Social Media P Röttger, JB Pierrehumbert EMNLP 2021 (Findings), 2021	69	2021
Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate HR Kirk, B Vidgen, P Röttger, T Thrush, SA Hale NAACL 2022 (Main), 2021	63	2021
Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models P Röttger, H Seelawi, D Nozza, Z Talat, B Vidgen WOAH at NAACL 2022, 2022	61	2022
The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals about the Subjective and Multicultural Alignment of Large Language Models HR Kirk, A Whitefield, P Röttger, AM Bean, K Margatina, R Mosquera, ... NeurIPS 2024 (Oral) - 🏆 Best Paper (Datasets & Benchmarks), 2024	49*	2024
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models P Röttger, V Hofmann, V Pyatkin, M Hinck, HR Kirk, H Schütze, D Hovy ACL 2024 (Main) - 🏆 Outstanding Paper, 2024	43	2024
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values HR Kirk, AM Bean, B Vidgen, P Röttger, SA Hale EMNLP 2023 (Main), 2023	35	2023
"My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models X Wang, B Ma, C Hu, L Weber-Genzel, P Röttger, F Kreuter, D Hovy, ... ACL 2024 (Findings), 2024	31	2024
Introducing v0.5 of the AI Safety Benchmark from MLCommons B Vidgen, A Agrawal, AM Ahmed, V Akinwande, N Al-Nuaimi, N Alfaraj, ... arXiv, 2024	30	2024
SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models B Vidgen, HR Kirk, R Qian, N Scherrer, A Kannappan, SA Hale, P Röttger arXiv, 2023	26	2023
The Ecological Fallacy in Annotation: Modelling Human Label Variation goes beyond Sociodemographics M Orlikowski, P Röttger, P Cimiano, D Hovy ACL 2023 (Main), 2023	22	2023
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety P Röttger, F Pernisi, B Vidgen, D Hovy arXiv preprint arXiv:2404.05399, 2024	20	2024
Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages P Röttger, D Nozza, F Bianchi, D Hovy EMNLP 2022 (Main), 2022	17	2022
Evaluating the Elementary Multilingual Capabilities of Large Language Models with MultiQ C Holtermann, P Röttger, T Dill, A Lauscher ACL 2024 (Findings), 2024	16	2024
The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising “Alignment” in Large Language Models HR Kirk, B Vidgen, P Röttger, SA Hale SoLaR at NeurIPS 2023, 2023	13	2023

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors