- Academic Search

X He, D Jiang, G Zhang, M Ku, A Soni, S Siu… - arxiv preprint arxiv …, 2024 - arxiv.org

The recent years have witnessed great advances in video generation. However, the
development of automatic video metrics is lagging significantly behind. None of the existing …

Save Cite Cited by 23 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Foundational autoraters: Taming large language models for better automatic evaluation

T Vu, K Krishna, S Alzubi, C Tar, M Faruqui… - arxiv preprint arxiv …, 2024 - arxiv.org

As large language models (LLMs) advance, it becomes more challenging to reliably
evaluate their output due to the high costs of human evaluation. To make progress towards …

Save Cite Cited by 22 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Skywork-reward: Bag of tricks for reward modeling in llms

CY Liu, L Zeng, J Liu, R Yan, J He, C Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

In this report, we introduce a collection of methods to enhance reward modeling for LLMs,
focusing specifically on data-centric techniques. We propose effective data selection and …

Save Cite Cited by 12 Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

Critique-out-loud reward models

Z Ankner, M Paul, B Cui, JD Chang… - arxiv preprint arxiv …, 2024 - arxiv.org

Traditionally, reward models used for reinforcement learning from human feedback (RLHF)
are trained to directly predict preference scores without leveraging the generation …

Save Cite Cited by 12 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Reinforcement Learning Enhanced LLMs: A Survey

S Wang, S Zhang, J Zhang, R Hu, X Li, T Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

This paper surveys research in the rapidly growing field of enhancing large language
models (LLMs) with reinforcement learning (RL), a technique that enables LLMs to improve …

Save Cite Cited by 1 Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

The good, the bad, and the greedy: Evaluation of llms should not ignore non-determinism

Y Song, G Wang, S Li, BY Lin - arxiv preprint arxiv:2407.10457, 2024 - arxiv.org

Current evaluations of large language models (LLMs) often overlook non-determinism,
typically focusing on a single output per example. This limits our understanding of LLM …

Save Cite Cited by 14 Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

Uncertainty-aware reward model: Teaching reward models to know what is unknown

X Lou, D Yan, W Shen, Y Yan, J **e… - arxiv preprint arxiv …, 2024 - arxiv.org

Reward models (RM) play a critical role in aligning generations of large language models
(LLM) to human expectations. However, prevailing RMs fail to capture the stochasticity …

Save Cite Cited by 9 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[HTML] mdpi.com

[HTML][HTML] Building an Ethical and Trustworthy Biomedical AI Ecosystem for the Translational and Clinical Integration of Foundation Models

BS Sankar, D Gilliland, J Rincon, H Hermjakob, Y Yan… - Bioengineering, 2024 - mdpi.com

Foundation Models (FMs) are gaining increasing attention in the biomedical artificial
intelligence (AI) ecosystem due to their ability to represent and contextualize multimodal …

Save Cite Cited by 1 Related articles All 6 versions Free GPT-4 Cached

[Free GPT-4]

[PDF] arxiv.org

Thinking llms: General instruction following with thought generation

T Wu, J Lan, W Yuan, J Jiao, J Weston… - arxiv preprint arxiv …, 2024 - arxiv.org

LLMs are typically trained to answer user questions or follow instructions similarly to how
human experts respond. However, in the standard alignment framework they lack the basic …

Save Cite Cited by 5 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Self-generated critiques boost reward modeling for language models

Y Yu, Z Chen, A Zhang, L Tan, C Zhu, RY Pang… - arxiv preprint arxiv …, 2024 - arxiv.org

Reward modeling is crucial for aligning large language models (LLMs) with human
preferences, especially in reinforcement learning from human feedback (RLHF). However …

Save Cite Cited by 4 Related articles View as HTML

Create alert

Cite

Advanced search

Saved to My library

Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts

Videoscore: Building automatic metrics to simulate fine-grained human feedback for video generation

Foundational autoraters: Taming large language models for better automatic evaluation

Skywork-reward: Bag of tricks for reward modeling in llms

Critique-out-loud reward models

Reinforcement Learning Enhanced LLMs: A Survey

The good, the bad, and the greedy: Evaluation of llms should not ignore non-determinism

Uncertainty-aware reward model: Teaching reward models to know what is unknown

[HTML][HTML] Building an Ethical and Trustworthy Biomedical AI Ecosystem for the Translational and Clinical Integration of Foundation Models

Thinking llms: General instruction following with thought generation

Self-generated critiques boost reward modeling for language models