Videoscore: Building automatic metrics to simulate fine-grained human feedback for video generation

X He, D Jiang, G Zhang, M Ku, A Soni, S Siu… - arxiv preprint arxiv …, 2024 - arxiv.org
The recent years have witnessed great advances in video generation. However, the
development of automatic video metrics is lagging significantly behind. None of the existing …

Foundational autoraters: Taming large language models for better automatic evaluation

T Vu, K Krishna, S Alzubi, C Tar, M Faruqui… - arxiv preprint arxiv …, 2024 - arxiv.org
As large language models (LLMs) advance, it becomes more challenging to reliably
evaluate their output due to the high costs of human evaluation. To make progress towards …

Skywork-reward: Bag of tricks for reward modeling in llms

CY Liu, L Zeng, J Liu, R Yan, J He, C Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
In this report, we introduce a collection of methods to enhance reward modeling for LLMs,
focusing specifically on data-centric techniques. We propose effective data selection and …

Critique-out-loud reward models

Z Ankner, M Paul, B Cui, JD Chang… - arxiv preprint arxiv …, 2024 - arxiv.org
Traditionally, reward models used for reinforcement learning from human feedback (RLHF)
are trained to directly predict preference scores without leveraging the generation …

Reinforcement Learning Enhanced LLMs: A Survey

S Wang, S Zhang, J Zhang, R Hu, X Li, T Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper surveys research in the rapidly growing field of enhancing large language
models (LLMs) with reinforcement learning (RL), a technique that enables LLMs to improve …

The good, the bad, and the greedy: Evaluation of llms should not ignore non-determinism

Y Song, G Wang, S Li, BY Lin - arxiv preprint arxiv:2407.10457, 2024 - arxiv.org
Current evaluations of large language models (LLMs) often overlook non-determinism,
typically focusing on a single output per example. This limits our understanding of LLM …

Uncertainty-aware reward model: Teaching reward models to know what is unknown

X Lou, D Yan, W Shen, Y Yan, J **e… - arxiv preprint arxiv …, 2024 - arxiv.org
Reward models (RM) play a critical role in aligning generations of large language models
(LLM) to human expectations. However, prevailing RMs fail to capture the stochasticity …

[HTML][HTML] Building an Ethical and Trustworthy Biomedical AI Ecosystem for the Translational and Clinical Integration of Foundation Models

BS Sankar, D Gilliland, J Rincon, H Hermjakob, Y Yan… - Bioengineering, 2024 - mdpi.com
Foundation Models (FMs) are gaining increasing attention in the biomedical artificial
intelligence (AI) ecosystem due to their ability to represent and contextualize multimodal …

Thinking llms: General instruction following with thought generation

T Wu, J Lan, W Yuan, J Jiao, J Weston… - arxiv preprint arxiv …, 2024 - arxiv.org
LLMs are typically trained to answer user questions or follow instructions similarly to how
human experts respond. However, in the standard alignment framework they lack the basic …

Self-generated critiques boost reward modeling for language models

Y Yu, Z Chen, A Zhang, L Tan, C Zhu, RY Pang… - arxiv preprint arxiv …, 2024 - arxiv.org
Reward modeling is crucial for aligning large language models (LLMs) with human
preferences, especially in reinforcement learning from human feedback (RLHF). However …