A review of safe reinforcement learning: Methods, theory and applications
Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …
making tasks. However, safety concerns are raised during deploying RL in real-world …
Combustion machine learning: Principles, progress and prospects
Progress in combustion science and engineering has led to the generation of large amounts
of data from large-scale simulations, high-resolution experiments, and sensors. This corpus …
of data from large-scale simulations, high-resolution experiments, and sensors. This corpus …
Rlaif: Scaling reinforcement learning from human feedback with ai feedback
Reinforcement learning from human feedback (RLHF) is an effective technique for aligning
large language models (LLMs) to human preferences, but gathering high-quality human …
large language models (LLMs) to human preferences, but gathering high-quality human …
[HTML][HTML] Connecting the dots in trustworthy Artificial Intelligence: From AI principles, ethics, and key requirements to responsible AI systems and regulation
Abstract Trustworthy Artificial Intelligence (AI) is based on seven technical requirements
sustained over three main pillars that should be met throughout the system's entire life cycle …
sustained over three main pillars that should be met throughout the system's entire life cycle …
Open problems and fundamental limitations of reinforcement learning from human feedback
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …
to align with human goals. RLHF has emerged as the central method used to finetune state …
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models
Language models demonstrate both quantitative improvement and new qualitative
capabilities with increasing scale. Despite their potentially transformative impact, these new …
capabilities with increasing scale. Despite their potentially transformative impact, these new …
Guiding pretraining in reinforcement learning with large language models
Reinforcement learning algorithms typically struggle in the absence of a dense, well-shaped
reward function. Intrinsically motivated exploration methods address this limitation by …
reward function. Intrinsically motivated exploration methods address this limitation by …
A generalist agent
Inspired by progress in large-scale language modeling, we apply a similar approach
towards building a single generalist agent beyond the realm of text outputs. The agent …
towards building a single generalist agent beyond the realm of text outputs. The agent …
Foundational challenges in assuring alignment and safety of large language models
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …
language models (LLMs). These challenges are organized into three different categories …
Red teaming language models with language models
Language Models (LMs) often cannot be deployed because of their potential to harm users
in hard-to-predict ways. Prior work identifies harmful behaviors before deployment by using …
in hard-to-predict ways. Prior work identifies harmful behaviors before deployment by using …