Scalable agent alignment via reward modeling: a research direction
J Leike, D Krueger, T Everitt, M Martic, V Maini… - ar** out the Space of Human Feedback for Reinforcement Learning: A Conceptual Framework
Reinforcement Learning from Human feedback (RLHF) has become a powerful tool to fine-
tune or train agentic machine learning models. Similar to how humans interact in social …
tune or train agentic machine learning models. Similar to how humans interact in social …
Instructed reinforcement learning control of safe autonomous j-turn vehicle maneuvers
This paper presents a safe control policy search for autonomous J-turn maneuvers inspired
by professional car drivers. These drivers have been highly trained to achieve highly …
by professional car drivers. These drivers have been highly trained to achieve highly …
Safe motion control and planning for autonomous racing vehicles
A Arab - 2021 - search.proquest.com
In the future of the autonomous car industry saving lives might depend on more demanding
maneuvers than what the average drivers know how to do. Professional race car drivers, as …
maneuvers than what the average drivers know how to do. Professional race car drivers, as …
Taming the Sample Complexity in Agentifying AI Systems by the Exploitation of Explicit Human Knowledge
L Guan - 2024 - search.proquest.com
Extensive efforts have been dedicated to the development of AI agents that can
independently carry out sequential decision-making tasks. Learning-based solutions …
independently carry out sequential decision-making tasks. Learning-based solutions …
Guidance Priors to Reduce Human Feedback Burden in Sequential Decision Making
M Verma - 2024 - search.proquest.com
Human in the loop sequential decision making such as Reinforcement Learning from
Human Feedback (RLHF) or behavior synthesis leverages human feedback to the AI system …
Human Feedback (RLHF) or behavior synthesis leverages human feedback to the AI system …