Durably reducing conspiracy beliefs through dialogues with AI

TH Costello, G Pennycook, DG Rand - Science, 2024 - science.org
Conspiracy theory beliefs are notoriously persistent. Influential hypotheses propose that they
fulfill important psychological needs, thus resisting counterevidence. Yet previous failures in …

Openai o1 system card

A Jaech, A Kalai, A Lerer, A Richardson… - arxiv preprint arxiv …, 2024 - arxiv.org
The o1 model series is trained with large-scale reinforcement learning to reason using chain
of thought. These advanced reasoning capabilities provide new avenues for improving the …

International Scientific Report on the Safety of Advanced AI (Interim Report)

Y Bengio, S Mindermann, D Privitera… - arxiv preprint arxiv …, 2024 - arxiv.org
This is the interim publication of the first International Scientific Report on the Safety of
Advanced AI. The report synthesises the scientific understanding of general-purpose AI--AI …

Safety cases for frontier AI

MD Buhl, G Sett, L Koessler, J Schuett… - arxiv preprint arxiv …, 2024 - arxiv.org
As frontier artificial intelligence (AI) systems become more capable, it becomes more
important that developers can explain why their systems are sufficiently safe. One way to do …

Risk thresholds for frontier AI

L Koessler, J Schuett, M Anderljung - arxiv preprint arxiv:2406.14713, 2024 - arxiv.org
Frontier artificial intelligence (AI) systems could pose increasing risks to public safety and
security. But what level of risk is acceptable? One increasingly popular approach is to define …

Safety case template for frontier AI: A cyber inability argument

A Goemans, MD Buhl, J Schuett, T Korbak… - arxiv preprint arxiv …, 2024 - arxiv.org
Frontier artificial intelligence (AI) systems pose increasing risks to society, making it
essential for developers to provide assurances about their safety. One approach to offering …

The Code That Binds Us: Navigating the Appropriateness of Human-AI Assistant Relationships

A Manzini, G Keeling, L Alberts, S Vallor… - Proceedings of the …, 2024 - ojs.aaai.org
The development of increasingly agentic and human-like AI assistants, capable of
performing a wide range of tasks on user's behalf over time, has sparked heightened interest …

Sabotage evaluations for frontier models

J Benton, M Wagner, E Christiansen, C Anil… - arxiv preprint arxiv …, 2024 - arxiv.org
Sufficiently capable models could subvert human oversight and decision-making in
important contexts. For example, in the context of AI development, models could covertly …