Scalable agent alignment via reward modeling: a research direction

J Leike, D Krueger, T Everitt, M Martic, V Maini… - arxiv preprint arxiv …, 2018 - arxiv.org
One obstacle to applying reinforcement learning algorithms to real-world problems is the
lack of suitable reward functions. Designing such reward functions is difficult in part because …

A game-theoretic approach to containing artificial general intelligence: Insights from highly autonomous aggressive malware

TR McIntosh, T Susnjak, T Liu, P Watters… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Artificial general intelligence (AGI) promises transformative societal changes but poses
safety and containment challenges. Large language models such as ChatGPT have …

Concrete problems in AI safety

D Amodei, C Olah, J Steinhardt, P Christiano… - arxiv preprint arxiv …, 2016 - arxiv.org
Rapid progress in machine learning and artificial intelligence (AI) has brought increasing
attention to the potential impacts of AI technologies on society. In this paper we discuss one …

Artificial intelligence safety and cybersecurity: A timeline of AI failures

RV Yampolskiy, MS Spellchecker - arxiv preprint arxiv:1610.07997, 2016 - arxiv.org
In this work, we present and analyze reported failures of artificially intelligent systems and
extrapolate our analysis to future AIs. We suggest that both the frequency and the …

On monitorability of AI

RV Yampolskiy - AI and Ethics, 2024 - Springer
Artificially intelligent (AI) systems have ushered in a transformative era across various
domains, yet their inherent traits of unpredictability, unexplainability, and uncontrollability …

[KNIHA][B] AI: Unexplainable, unpredictable, uncontrollable

RV Yampolskiy - 2024 - books.google.com
Delving into the deeply enigmatic nature of Artificial Intelligence (AI), AI: Unexplainable,
Unpredictable, Uncontrollable explores the various reasons why the field is so challenging …

Superintelligence cannot be contained: Lessons from computability theory

M Alfonseca, M Cebrian, AF Anta, L Coviello… - Journal of Artificial …, 2021 - jair.org
Superintelligence is a hypothetical agent that possesses intelligence far surpassing that of
the brightest and most gifted human minds. In light of recent advances in machine …

[KNIHA][B] The Book of Chatbots: From ELIZA to ChatGPT

R Ciesla - 2024 - Springer
Primitive software chatbots emerged in the 1960s, evolving swiftly through the decades and
becoming able to provide engaging human-to-computer interactions sometime in the 1990s …

Unpredictability of AI: On the impossibility of accurately predicting all actions of a smarter agent

RV Yampolskiy - Journal of Artificial Intelligence and …, 2020 - World Scientific
The young field of AI Safety is still in the process of identifying its challenges and limitations.
In this paper, we formally describe one such impossibility result, namely Unpredictability of …

Predicting future AI failures from historic examples

RV Yampolskiy - foresight, 2019 - emerald.com
Purpose The purpose of this paper is to explain to readers how intelligent systems can fail
and how artificial intelligence (AI) safety is different from cybersecurity. The goal of …