Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Ai alignment: A comprehensive survey
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, so do risks from misalignment. To provide a comprehensive …
AI systems grow more capable, so do risks from misalignment. To provide a comprehensive …
A survey of imitation learning: Algorithms, recent developments, and challenges
M Zare, PM Kebria, A Khosravi… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
In recent years, the development of robotics and artificial intelligence (AI) systems has been
nothing short of remarkable. As these systems continue to evolve, they are being utilized in …
nothing short of remarkable. As these systems continue to evolve, they are being utilized in …
[PDF][PDF] A survey of reinforcement learning from human feedback
Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning
(RL) that learns from human feedback instead of relying on an engineered reward function …
(RL) that learns from human feedback instead of relying on an engineered reward function …
[PDF][PDF] Towards guaranteed safe ai: A framework for ensuring robust and reliable ai systems
Ensuring that AI systems reliably and robustly avoid harmful or dangerous behaviours is a
crucial challenge, especially for AI systems with a high degree of autonomy and general …
crucial challenge, especially for AI systems with a high degree of autonomy and general …
Maximum-likelihood inverse reinforcement learning with finite-time guarantees
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated
optimal policy that best fits observed sequences of states and actions implemented by an …
optimal policy that best fits observed sequences of states and actions implemented by an …
Invariance in policy optimisation and partial identifiability in reward learning
It is often very challenging to manually design reward functions for complex, real-world
tasks. To solve this, one can instead use reward learning to infer a reward function from …
tasks. To solve this, one can instead use reward learning to infer a reward function from …
Misspecification in inverse reinforcement learning
Abstract The aim of Inverse Reinforcement Learning (IRL) is to infer a reward function R from
a policy pi. To do this, we need a model of how pi relates to R. In the current literature, the …
a policy pi. To do this, we need a model of how pi relates to R. In the current literature, the …
Beyond preferences in ai alignment
The dominant practice of AI alignment assumes (1) that preferences are an adequate
representation of human values,(2) that human rationality can be understood in terms of …
representation of human values,(2) that human rationality can be understood in terms of …
Identifiability in inverse reinforcement learning
Inverse reinforcement learning attempts to reconstruct the reward function in a Markov
decision problem, using observations of agent actions. As already observed in Russell …
decision problem, using observations of agent actions. As already observed in Russell …
Models of human preference for learning reward functions
The utility of reinforcement learning is limited by the alignment of reward functions with the
interests of human stakeholders. One promising method for alignment is to learn the reward …
interests of human stakeholders. One promising method for alignment is to learn the reward …