- Academic Search

PS Park, S Goldstein, A O'Gara, M Chen, D Hendrycks - Patterns, 2024 - cell.com

This paper argues that a range of current AI systems have learned how to deceive humans.
We define deception as the systematic inducement of false beliefs in the pursuit of some …

Salva Cita Citato da 166 Articoli correlati Tutte e 10 le versioni

[Free GPT-4]
[DeepSeek]

[PDF] nature.com

On scientific understanding with artificial intelligence

M Krenn, R Pollice, SY Guo, M Aldeghi… - Nature Reviews …, 2022 - nature.com

An oracle that correctly predicts the outcome of every particle physics experiment, the
products of every possible chemical reaction or the function of every protein would …

Salva Cita Citato da 255 Articoli correlati Tutte e 11 le versioni

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Scaling laws for reward model overoptimization

L Gao, J Schulman, J Hilton - International Conference on …, 2023 - proceedings.mlr.press

In reinforcement learning from human feedback, it is common to optimize against a reward
model trained to predict human preferences. Because the reward model is an imperfect …

Salva Cita Citato da 387 Articoli correlati Tutte e 7 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Guiding pretraining in reinforcement learning with large language models

Y Du, O Watkins, Z Wang, C Colas… - International …, 2023 - proceedings.mlr.press

Reinforcement learning algorithms typically struggle in the absence of a dense, well-shaped
reward function. Intrinsically motivated exploration methods address this limitation by …

Salva Cita Citato da 207 Articoli correlati Tutte e 7 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arxiv preprint arxiv …, 2024 - arxiv.org

This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

Salva Cita Citato da 124 Articoli correlati Tutte e 3 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Unsolved problems in ml safety

D Hendrycks, N Carlini, J Schulman… - arxiv preprint arxiv …, 2021 - arxiv.org

Machine learning (ML) systems are rapidly increasing in size, are acquiring new
capabilities, and are increasingly deployed in high-stakes settings. As with other powerful …

Salva Cita Citato da 345 Articoli correlati Tutte e 6 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] datascienceassn.org

Shortcut learning in deep neural networks

R Geirhos, JH Jacobsen, C Michaelis… - Nature Machine …, 2020 - nature.com

Deep learning has triggered the current rise of artificial intelligence and is the workhorse of
today's machine intelligence. Numerous success stories have rapidly spread all over …

Salva Cita Citato da 2225 Articoli correlati Tutte e 12 le versioni

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

Salva Cita Citato da 232 Articoli correlati Tutte e 3 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The alignment problem from a deep learning perspective

R Ngo, L Chan, S Mindermann - arxiv preprint arxiv:2209.00626, 2022 - arxiv.org

In coming decades, artificial general intelligence (AGI) may surpass human capabilities at
many critical tasks. We argue that, without substantial effort to prevent it, AGIs could learn to …

Salva Cita Citato da 195 Articoli correlati Tutte e 4 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Emergent tool use from multi-agent autocurricula

B Baker, I Kanitscheider, T Markov, Y Wu… - arxiv preprint arxiv …, 2019 - arxiv.org

Through multi-agent competition, the simple objective of hide-and-seek, and standard
reinforcement learning algorithms at scale, we find that agents create a self-supervised …

Salva Cita Citato da 914 Articoli correlati Tutte e 3 le versioni Versione HTML

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

The surprising creativity of digital evolution: A collection of anecdotes from the evolutionary...

[HTML][HTML] AI deception: A survey of examples, risks, and potential solutions

On scientific understanding with artificial intelligence

Scaling laws for reward model overoptimization

Guiding pretraining in reinforcement learning with large language models

Foundational challenges in assuring alignment and safety of large language models

Unsolved problems in ml safety

Shortcut learning in deep neural networks

Ai alignment: A comprehensive survey

The alignment problem from a deep learning perspective

Emergent tool use from multi-agent autocurricula