Active observing in continuous-time control
The control of continuous-time environments while actively deciding when to take costly
observations in time is a crucial yet unexplored problem, particularly relevant to real-world …
observations in time is a crucial yet unexplored problem, particularly relevant to real-world …
Query-dependent prompt evaluation and optimization with offline inverse RL
In this study, we aim to enhance the arithmetic reasoning ability of Large Language Models
(LLMs) through zero-shot prompt optimization. We identify a previously overlooked objective …
(LLMs) through zero-shot prompt optimization. We identify a previously overlooked objective …
What is flagged in uncertainty quantification? latent density models for uncertainty categorization
Uncertainty quantification (UQ) is essential for creating trustworthy machine learning
models. Recent years have seen a steep rise in UQ methods that can flag suspicious …
models. Recent years have seen a steep rise in UQ methods that can flag suspicious …
Towards robust offline reinforcement learning under diverse data corruption
Offline reinforcement learning (RL) presents a promising approach for learning reinforced
policies from offline datasets without the need for costly or unsafe interactions with the …
policies from offline datasets without the need for costly or unsafe interactions with the …
Reinforcement learning in the era of llms: What is essential? what is needed? an rl perspective on rlhf, prompting, and beyond
H Sun - arxiv preprint arxiv:2310.06147, 2023 - arxiv.org
Recent advancements in Large Language Models (LLMs) have garnered wide attention and
led to successful products such as ChatGPT and GPT-4. Their proficiency in adhering to …
led to successful products such as ChatGPT and GPT-4. Their proficiency in adhering to …
Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment
Aligning Large Language Models (LLMs) is crucial for enhancing their safety and utility.
However, existing methods, primarily based on preference datasets, face challenges such …
However, existing methods, primarily based on preference datasets, face challenges such …
[PDF][PDF] What is Flagged in Uncertainty Quantification? Latent Density Models for Uncertainty Categorization
Uncertainty quantification (UQ) is essential for creating trustworthy machine learning
models. Recent years have seen a steep rise in UQ methods that can flag suspicious …
models. Recent years have seen a steep rise in UQ methods that can flag suspicious …
Supervised Fine-Tuning as Inverse Reinforcement Learning
H Sun - arxiv preprint arxiv:2403.12017, 2024 - arxiv.org
The prevailing approach to aligning Large Language Models (LLMs) typically relies on
human or AI feedback and assumes access to specific types of preference datasets. In our …
human or AI feedback and assumes access to specific types of preference datasets. In our …
Robust Decision Transformer: Tackling Data Corruption in Offline RL via Sequence Modeling
Learning policies from offline datasets through offline reinforcement learning (RL) holds
promise for scaling data-driven decision-making and avoiding unsafe and costly online …
promise for scaling data-driven decision-making and avoiding unsafe and costly online …
Defining Expertise: Applications to Treatment Effect Estimation
Decision-makers are often experts of their domain and take actions based on their domain
knowledge. Doctors, for instance, may prescribe treatments by predicting the likely outcome …
knowledge. Doctors, for instance, may prescribe treatments by predicting the likely outcome …