Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
[PDF][PDF] LTL and beyond: Formal languages for reward function specification in reinforcement learning.
Abstract In Reinforcement Learning (RL), an agent is guided by the rewards it receives from
the reward function. Unfortunately, it may take many interactions with the environment to …
the reward function. Unfortunately, it may take many interactions with the environment to …
[PDF][PDF] Teaching multiple tasks to an RL agent using LTL
Reinforcement Learning (RL) algorithms are capable of learning effective behaviours
through trial and error interactions with their environment [40]. The recent combination of …
through trial and error interactions with their environment [40]. The recent combination of …
Foundations for restraining bolts: Reinforcement learning with LTLf/LDLf restraining specifications
In this work we investigate on the concept of “restraining bolt”, envisioned in Science Fiction.
Specifically we introduce a novel problem in AI. We have two distinct sets of features …
Specifically we introduce a novel problem in AI. We have two distinct sets of features …
A formal methods approach to interpretable reinforcement learning for robotic planning
Growing interest in reinforcement learning approaches to robotic planning and control raises
concerns of predictability and safety of robot behaviors realized solely through learned …
concerns of predictability and safety of robot behaviors realized solely through learned …
Reinforcement learning with non-markovian rewards
The standard RL world model is that of a Markov Decision Process (MDP). A basic premise
of MDPs is that the rewards depend on the last state and action only. Yet, many real-world …
of MDPs is that the rewards depend on the last state and action only. Yet, many real-world …
LTLf/LDLf non-markovian rewards
Abstract In Markov Decision Processes (MDPs), the reward obtained in a state is Markovian,
ie, depends on the last state and action. This dependency makes it difficult to reward more …
ie, depends on the last state and action. This dependency makes it difficult to reward more …
[LIVRE][B] Multi-objective decision making
Many real-world decision problems have multiple objectives. For example, when choosing a
medical treatment plan, we want to maximize the efficacy of the treatment, but also minimize …
medical treatment plan, we want to maximize the efficacy of the treatment, but also minimize …
Pure-past linear temporal and dynamic logic on finite traces
LTLf and LDLf are well-known logics on finite traces. We review PLTLf and PLDLf, their pure-
past versions. These are interpreted backward from the end of the trace towards the …
past versions. These are interpreted backward from the end of the trace towards the …
Neural ordinary differential equation control of dynamics on graphs
We study the ability of neural networks to calculate feedback control signals that steer
trajectories of continuous-time nonlinear dynamical systems on graphs, which we represent …
trajectories of continuous-time nonlinear dynamical systems on graphs, which we represent …
Reinforcement learning for joint optimization of multiple rewards
Finding optimal policies which maximize long term rewards of Markov Decision Processes
requires the use of dynamic programming and backward induction to solve the Bellman …
requires the use of dynamic programming and backward induction to solve the Bellman …