Safetyprompts: a systematic review of open datasets for evaluating and improving large language model safety

P Röttger, F Pernisi, B Vidgen, D Hovy - arxiv preprint arxiv:2404.05399, 2024 - arxiv.org
The last two years have seen a rapid growth in concerns around the safety of large
language models (LLMs). Researchers and practitioners have met these concerns by …

A survey on large language models for software engineering

Q Zhang, C Fang, Y **e, Y Zhang, Y Yang… - arxiv preprint arxiv …, 2023 - arxiv.org
Software Engineering (SE) is the systematic design, development, maintenance, and
management of software applications underpinning the digital infrastructure of our modern …

[PDF][PDF] A comprehensive survey of small language models in the era of large language models: Techniques, enhancements, applications, collaboration with llms, and …

F Wang, Z Zhang, X Zhang, Z Wu, T Mo, Q Lu… - arxiv preprint arxiv …, 2024 - ai.radensa.ru
Large language models (LLM) have demonstrated emergent abilities in text generation,
question answering, and reasoning, facilitating various tasks and domains. Despite their …

Eureka: Evaluating and understanding large foundation models

V Balachandran, J Chen, N Joshi, B Nushi… - arxiv preprint arxiv …, 2024 - arxiv.org
Rigorous and reproducible evaluation is critical for assessing the state of the art and for
guiding scientific advances in Artificial Intelligence. Evaluation is challenging in practice due …

Clave: An adaptive framework for evaluating values of llm generated responses

J Yao, X Yi, X **e - arxiv preprint arxiv:2407.10725, 2024 - arxiv.org
The rapid progress in Large Language Models (LLMs) poses potential risks such as
generating unethical content. Assessing LLMs' values can help expose their misalignment …

Trustworthy, responsible, and safe ai: A comprehensive architectural framework for ai safety with challenges and mitigations

C Chen, Z Liu, W Jiang, SQ Goh, KKY Lam - arxiv preprint arxiv …, 2024 - arxiv.org
AI Safety is an emerging area of critical importance to the safe adoption and deployment of
AI systems. With the rapid proliferation of AI and especially with the recent advancement of …

Astral: Automated safety testing of large language models

M Ugarte, P Valle, JA Parejo, S Segura… - arxiv preprint arxiv …, 2025 - arxiv.org
Large Language Models (LLMs) have recently gained attention due to their ability to
understand and generate sophisticated human-like content. However, ensuring their safety …

Value compass leaderboard: A platform for fundamental and validated evaluation of llms values

J Yao, X Yi, S Duan, J Wang, Y Bai, M Huang… - arxiv preprint arxiv …, 2025 - arxiv.org
As Large Language Models (LLMs) achieve remarkable breakthroughs, aligning their
values with humans has become imperative for their responsible development and …

Early External Safety Testing of OpenAI's o3-mini: Insights from the Pre-Deployment Evaluation

A Arrieta, M Ugarte, P Valle, JA Parejo… - arxiv preprint arxiv …, 2025 - arxiv.org
Large Language Models (LLMs) have become an integral part of our daily lives. However,
they impose certain risks, including those that can harm individuals' privacy, perpetuate …

BENCHAGENTS: Automated Benchmark Creation with Agent Interaction

N Butt, V Chandrasekaran, N Joshi, B Nushi… - arxiv preprint arxiv …, 2024 - arxiv.org
Evaluations are limited by benchmark availability. As models evolve, there is a need to
create benchmarks that can measure progress on new generative capabilities. However …