Google Академик

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, so do risks from misalignment. To provide a comprehensive …

Сачувај Цитирај 247 пута наведен Сродни чланци Све верзије (4) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Open problems and fundamental limitations of reinforcement learning from human feedback

S Casper, X Davies, C Shi, TK Gilbert… - arxiv preprint arxiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

Сачувај Цитирај 477 пута наведен Сродни чланци Све верзије (7) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] jair.org Full View

Reinforcement learning for generative AI: State of the art, opportunities and open research challenges

G Franceschelli, M Musolesi - Journal of Artificial Intelligence Research, 2024 - jair.org

Abstract Generative Artificial Intelligence (AI) is one of the most exciting developments in
Computer Science of the last decade. At the same time, Reinforcement Learning (RL) has …

Сачувај Цитирај 22 пута наведен Сродни чланци Све верзије (11) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MiniLLM: Knowledge distillation of large language models

Y Gu, L Dong, F Wei, M Huang - arxiv preprint arxiv:2306.08543, 2023 - arxiv.org

Knowledge Distillation (KD) is a promising technique for reducing the high computational
demand of large language models (LLMs). However, previous KD methods are primarily …

Сачувај Цитирај 316 пута наведен Сродни чланци Све верзије (4) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Diffusion model alignment using direct preference optimization

B Wallace, M Dang, R Rafailov… - Proceedings of the …, 2024 - openaccess.thecvf.com

Large language models (LLMs) are fine-tuned using human comparison data with
Reinforcement Learning from Human Feedback (RLHF) methods to make them better …

Сачувај Цитирај 146 пута наведен Сродни чланци Све верзије (6) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Reinforced self-training (rest) for language modeling

C Gulcehre, TL Paine, S Srinivasan… - arxiv preprint arxiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) can improve the quality of large
language model's (LLM) outputs by aligning them with human preferences. We propose a …

Сачувај Цитирај 235 пута наведен Сродни чланци Све верзије (5) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arxiv preprint arxiv …, 2024 - arxiv.org

This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

Сачувај Цитирај 136 пута наведен Сродни чланци Све верзије (7) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Rewarded soups: towards pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

A Rame, G Couairon, C Dancette… - Advances in …, 2023 - proceedings.neurips.cc

Foundation models are first pre-trained on vast unsupervised datasets and then fine-tuned
on labeled data. Reinforcement learning, notably from human feedback (RLHF), can further …

Сачувај Цитирај 114 пута наведен Сродни чланци Све верзије (7) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The alignment problem from a deep learning perspective

R Ngo, L Chan, S Mindermann - arxiv preprint arxiv:2209.00626, 2022 - arxiv.org

In coming years or decades, artificial general intelligence (AGI) may surpass human
capabilities at many critical tasks. We argue that, without substantial effort to prevent it, AGIs …

Сачувај Цитирај 200 пута наведен Сродни чланци Све верзије (5) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A long way to go: Investigating length correlations in rlhf

P Singhal, T Goyal, J Xu, G Durrett - arxiv preprint arxiv:2310.03716, 2023 - arxiv.org

Great success has been reported using Reinforcement Learning from Human Feedback
(RLHF) to align large language models, with open preference datasets enabling wider …

Сачувај Цитирај 98 пута наведен Сродни чланци Све верзије (4) HTML верзија

Направи обавештење

Цитирај

Напредна претрага

Сачувано у мојој библиотеци

Defining and characterizing reward gaming

Ai alignment: A comprehensive survey

Open problems and fundamental limitations of reinforcement learning from human feedback

Reinforcement learning for generative AI: State of the art, opportunities and open research challenges

MiniLLM: Knowledge distillation of large language models

Diffusion model alignment using direct preference optimization

Reinforced self-training (rest) for language modeling

Foundational challenges in assuring alignment and safety of large language models

Rewarded soups: towards pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

The alignment problem from a deep learning perspective

A long way to go: Investigating length correlations in rlhf