Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Adversarial training for high-stakes reliability
In the future, powerful AI systems may be deployed in high-stakes settings, where a single
failure could be catastrophic. One technique for improving AI safety in high-stakes settings is …
failure could be catastrophic. One technique for improving AI safety in high-stakes settings is …
Multi-modal inverse constrained reinforcement learning from a mixture of demonstrations
Abstract Inverse Constraint Reinforcement Learning (ICRL) aims to recover the underlying
constraints respected by expert agents in a data-driven manner. Existing ICRL algorithms …
constraints respected by expert agents in a data-driven manner. Existing ICRL algorithms …
Scalable bayesian inverse reinforcement learning
Bayesian inference over the reward presents an ideal solution to the ill-posed nature of the
inverse reinforcement learning problem. Unfortunately current methods generally do not …
inverse reinforcement learning problem. Unfortunately current methods generally do not …
Universal off-policy evaluation
When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …
what would happen if decisions were made using a new policy. Those predictions must …
Fast bellman updates for wasserstein distributionally robust mdps
Markov decision processes (MDPs) often suffer from the sensitivity issue under model
ambiguity. In recent years, robust MDPs have emerged as an effective framework to …
ambiguity. In recent years, robust MDPs have emerged as an effective framework to …
Entropic risk optimization in discounted MDPs
Abstract Risk-averse Markov Decision Processes (MDPs) have optimal policies that achieve
high returns with low variability, but these MDPs are often difficult to solve. Only a few …
high returns with low variability, but these MDPs are often difficult to solve. Only a few …
Stap: Sequencing task-agnostic policies
Advances in robotic skill acquisition have made it possible to build general-purpose libraries
of learned skills for downstream manipulation tasks. However, naively executing these skills …
of learned skills for downstream manipulation tasks. However, naively executing these skills …
Partially observable task and motion planning with uncertainty and risk awareness
Integrated task and motion planning (TAMP) has proven to be a valuable approach to
generalizable long-horizon robotic manipulation and navigation problems. However, the …
generalizable long-horizon robotic manipulation and navigation problems. However, the …
Aligning human preferences with baseline objectives in reinforcement learning
Practical implementations of deep reinforcement learning (deep RL) have been challenging
due to an amplitude of factors, such as designing reward functions that cover every possible …
due to an amplitude of factors, such as designing reward functions that cover every possible …
Policy gradient bayesian robust optimization for imitation learning
The difficulty in specifying rewards for many real-world problems has led to an increased
focus on learning rewards from human feedback, such as demonstrations. However, there …
focus on learning rewards from human feedback, such as demonstrations. However, there …