- Academic Search

S Gu, L Yang, Y Du, G Chen, F Walter, J Wang… - arxiv preprint arxiv …, 2022 - arxiv.org

Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …

Speichern Zitieren Zitiert von: 297 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] sciencedirect.com

Combustion machine learning: Principles, progress and prospects

M Ihme, WT Chung, AA Mishra - Progress in Energy and Combustion …, 2022 - Elsevier

Progress in combustion science and engineering has led to the generation of large amounts
of data from large-scale simulations, high-resolution experiments, and sensors. This corpus …

Speichern Zitieren Zitiert von: 204 Ähnliche Artikel Alle 4 Versionen

[Free GPT-4]

[PDF] openreview.net

Rlaif: Scaling reinforcement learning from human feedback with ai feedback

H Lee, S Phatale, H Mansoor, KR Lu, T Mesnard… - 2023 - openreview.net

Reinforcement learning from human feedback (RLHF) is an effective technique for aligning
large language models (LLMs) to human preferences, but gathering high-quality human …

Speichern Zitieren Zitiert von: 452 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]

[HTML] sciencedirect.com

[HTML][HTML] Connecting the dots in trustworthy Artificial Intelligence: From AI principles, ethics, and key requirements to responsible AI systems and regulation

N Díaz-Rodríguez, J Del Ser, M Coeckelbergh… - Information …, 2023 - Elsevier

Abstract Trustworthy Artificial Intelligence (AI) is based on seven technical requirements
sustained over three main pillars that should be met throughout the system's entire life cycle …

Speichern Zitieren Zitiert von: 390 Ähnliche Artikel Alle 6 Versionen

[Free GPT-4]

[PDF] arxiv.org

Open problems and fundamental limitations of reinforcement learning from human feedback

S Casper, X Davies, C Shi, TK Gilbert… - arxiv preprint arxiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

Speichern Zitieren Zitiert von: 436 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Beyond the imitation game: Quantifying and extrapolating the capabilities of language models

A Srivastava, A Rastogi, A Rao, AAM Shoeb… - arxiv preprint arxiv …, 2022 - arxiv.org

Language models demonstrate both quantitative improvement and new qualitative
capabilities with increasing scale. Despite their potentially transformative impact, these new …

Speichern Zitieren Zitiert von: 1293 Ähnliche Artikel Alle 11 Versionen HTML-Version

[Free GPT-4]

[PDF] mlr.press

Guiding pretraining in reinforcement learning with large language models

Y Du, O Watkins, Z Wang, C Colas… - International …, 2023 - proceedings.mlr.press

Reinforcement learning algorithms typically struggle in the absence of a dense, well-shaped
reward function. Intrinsically motivated exploration methods address this limitation by …

Speichern Zitieren Zitiert von: 207 Ähnliche Artikel Alle 7 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

A generalist agent

S Reed, K Zolna, E Parisotto, SG Colmenarejo… - arxiv preprint arxiv …, 2022 - arxiv.org

Inspired by progress in large-scale language modeling, we apply a similar approach
towards building a single generalist agent beyond the realm of text outputs. The agent …

Speichern Zitieren Zitiert von: 994 Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arxiv preprint arxiv …, 2024 - arxiv.org

This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

Speichern Zitieren Zitiert von: 118 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] aclanthology.org

Red teaming language models with language models

E Perez, S Huang, F Song, T Cai, R Ring… - arxiv preprint arxiv …, 2022 - arxiv.org

Language Models (LMs) often cannot be deployed because of their potential to harm users
in hard-to-predict ways. Prior work identifies harmful behaviors before deployment by using …

Speichern Zitieren Zitiert von: 595 Ähnliche Artikel Alle 4 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Concrete problems in AI safety

A review of safe reinforcement learning: Methods, theory and applications

Combustion machine learning: Principles, progress and prospects

Rlaif: Scaling reinforcement learning from human feedback with ai feedback

[HTML][HTML] Connecting the dots in trustworthy Artificial Intelligence: From AI principles, ethics, and key requirements to responsible AI systems and regulation

Open problems and fundamental limitations of reinforcement learning from human feedback

Beyond the imitation game: Quantifying and extrapolating the capabilities of language models

Guiding pretraining in reinforcement learning with large language models

A generalist agent

Foundational challenges in assuring alignment and safety of large language models

Red teaming language models with language models