Videoscore: Building automatic metrics to simulate fine-grained human feedback for video generation
The recent years have witnessed great advances in video generation. However, the
development of automatic video metrics is lagging significantly behind. None of the existing …
development of automatic video metrics is lagging significantly behind. None of the existing …
Foundational autoraters: Taming large language models for better automatic evaluation
As large language models (LLMs) advance, it becomes more challenging to reliably
evaluate their output due to the high costs of human evaluation. To make progress towards …
evaluate their output due to the high costs of human evaluation. To make progress towards …
Skywork-reward: Bag of tricks for reward modeling in llms
In this report, we introduce a collection of methods to enhance reward modeling for LLMs,
focusing specifically on data-centric techniques. We propose effective data selection and …
focusing specifically on data-centric techniques. We propose effective data selection and …
Critique-out-loud reward models
Traditionally, reward models used for reinforcement learning from human feedback (RLHF)
are trained to directly predict preference scores without leveraging the generation …
are trained to directly predict preference scores without leveraging the generation …
Reinforcement Learning Enhanced LLMs: A Survey
This paper surveys research in the rapidly growing field of enhancing large language
models (LLMs) with reinforcement learning (RL), a technique that enables LLMs to improve …
models (LLMs) with reinforcement learning (RL), a technique that enables LLMs to improve …
The good, the bad, and the greedy: Evaluation of llms should not ignore non-determinism
Current evaluations of large language models (LLMs) often overlook non-determinism,
typically focusing on a single output per example. This limits our understanding of LLM …
typically focusing on a single output per example. This limits our understanding of LLM …
Uncertainty-aware reward model: Teaching reward models to know what is unknown
Reward models (RM) play a critical role in aligning generations of large language models
(LLM) to human expectations. However, prevailing RMs fail to capture the stochasticity …
(LLM) to human expectations. However, prevailing RMs fail to capture the stochasticity …
[HTML][HTML] Building an Ethical and Trustworthy Biomedical AI Ecosystem for the Translational and Clinical Integration of Foundation Models
Foundation Models (FMs) are gaining increasing attention in the biomedical artificial
intelligence (AI) ecosystem due to their ability to represent and contextualize multimodal …
intelligence (AI) ecosystem due to their ability to represent and contextualize multimodal …
Thinking llms: General instruction following with thought generation
LLMs are typically trained to answer user questions or follow instructions similarly to how
human experts respond. However, in the standard alignment framework they lack the basic …
human experts respond. However, in the standard alignment framework they lack the basic …
Self-generated critiques boost reward modeling for language models
Reward modeling is crucial for aligning large language models (LLMs) with human
preferences, especially in reinforcement learning from human feedback (RLHF). However …
preferences, especially in reinforcement learning from human feedback (RLHF). However …