Safetyprompts: a systematic review of open datasets for evaluating and improving large language model safety

P Röttger, F Pernisi, B Vidgen, D Hovy - arxiv preprint arxiv:2404.05399, 2024 - arxiv.org
The last two years have seen a rapid growth in concerns around the safety of large
language models (LLMs). Researchers and practitioners have met these concerns by …

Towards bidirectional human-ai alignment: A systematic review for clarifications, framework, and future directions

H Shen, T Knearem, R Ghosh, K Alkiek… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in general-purpose AI have highlighted the importance of guiding AI
systems towards the intended goals, ethical principles, and values of individuals and …

Political compass or spinning arrow? towards more meaningful evaluations for values and opinions in large language models

P Röttger, V Hofmann, V Pyatkin, M Hinck… - arxiv preprint arxiv …, 2024 - arxiv.org
Much recent work seeks to evaluate values and opinions in large language models (LLMs)
using multiple-choice surveys and questionnaires. Most of this work is motivated by …

Open problems in technical ai governance

A Reuel, B Bucknall, S Casper, T Fist, L Soder… - arxiv preprint arxiv …, 2024 - arxiv.org
AI progress is creating a growing range of risks and opportunities, but it is often unclear how
they should be navigated. In many cases, the barriers and uncertainties faced are at least …

Conifer: Improving complex constrained instruction-following ability of large language models

H Sun, L Liu, J Li, F Wang, B Dong, R Lin… - arxiv preprint arxiv …, 2024 - arxiv.org
The ability of large language models (LLMs) to follow instructions is crucial to real-world
applications. Despite recent advances, several studies have highlighted that LLMs struggle …

Gender, race, and intersectional bias in resume screening via language model retrieval

K Wilson, A Caliskan - Proceedings of the AAAI/ACM Conference on AI …, 2024 - ojs.aaai.org
Artificial intelligence (AI) hiring tools have revolutionized resume screening, and large
language models (LLMs) have the potential to do the same. However, given the biases …

Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks

L Ibrahim, S Huang, L Ahmad, M Anderljung - arxiv preprint arxiv …, 2024 - arxiv.org
Model evaluations are central to understanding the safety, risks, and societal impacts of AI
systems. While most real-world AI applications involve human-AI interaction, most current …

Structured chemistry reasoning with large language models

S Ouyang, Z Zhang, B Yan, X Liu, Y Choi, J Han… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) excel in diverse areas, yet struggle with complex scientific
reasoning, especially in the field of chemistry. Different from the simple chemistry tasks (eg …

Instruct and extract: Instruction tuning for on-demand information extraction

Y Jiao, M Zhong, S Li, R Zhao, S Ouyang, H Ji… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models with instruction-following capabilities open the door to a wider
group of users. However, when it comes to information extraction-a classic task in natural …

Dolomites: Domain-Specific Long-Form Methodical Tasks

C Malaviya, P Agrawal, K Ganchev… - Transactions of the …, 2025 - direct.mit.edu
Experts in various fields routinely perform methodical writing tasks to plan, organize, and
report their work. From a clinician writing a differential diagnosis for a patient, to a teacher …