Durably reducing conspiracy beliefs through dialogues with AI
Conspiracy theory beliefs are notoriously persistent. Influential hypotheses propose that they
fulfill important psychological needs, thus resisting counterevidence. Yet previous failures in …
fulfill important psychological needs, thus resisting counterevidence. Yet previous failures in …
[PDF][PDF] Managing ai risks in an era of rapid progress
Openai o1 system card
The o1 model series is trained with large-scale reinforcement learning to reason using chain
of thought. These advanced reasoning capabilities provide new avenues for improving the …
of thought. These advanced reasoning capabilities provide new avenues for improving the …
International Scientific Report on the Safety of Advanced AI (Interim Report)
Y Bengio, S Mindermann, D Privitera… - arxiv preprint arxiv …, 2024 - arxiv.org
This is the interim publication of the first International Scientific Report on the Safety of
Advanced AI. The report synthesises the scientific understanding of general-purpose AI--AI …
Advanced AI. The report synthesises the scientific understanding of general-purpose AI--AI …
Safety cases for frontier AI
MD Buhl, G Sett, L Koessler, J Schuett… - arxiv preprint arxiv …, 2024 - arxiv.org
As frontier artificial intelligence (AI) systems become more capable, it becomes more
important that developers can explain why their systems are sufficiently safe. One way to do …
important that developers can explain why their systems are sufficiently safe. One way to do …
Risk thresholds for frontier AI
Frontier artificial intelligence (AI) systems could pose increasing risks to public safety and
security. But what level of risk is acceptable? One increasingly popular approach is to define …
security. But what level of risk is acceptable? One increasingly popular approach is to define …
Safety case template for frontier AI: A cyber inability argument
Frontier artificial intelligence (AI) systems pose increasing risks to society, making it
essential for developers to provide assurances about their safety. One approach to offering …
essential for developers to provide assurances about their safety. One approach to offering …
The Code That Binds Us: Navigating the Appropriateness of Human-AI Assistant Relationships
The development of increasingly agentic and human-like AI assistants, capable of
performing a wide range of tasks on user's behalf over time, has sparked heightened interest …
performing a wide range of tasks on user's behalf over time, has sparked heightened interest …
Sabotage evaluations for frontier models
Sufficiently capable models could subvert human oversight and decision-making in
important contexts. For example, in the context of AI development, models could covertly …
important contexts. For example, in the context of AI development, models could covertly …