Študovňa Google

Y Deng, P Lu, F Yin, Z Hu, S Shen, Q Gu, J Zou… - arxiv preprint arxiv …, 2024 - arxiv.org

Large vision language models (LVLMs) integrate large language models (LLMs) with pre-
trained vision encoders, thereby activating the perception capability of the model to …

Uložiť Citovať Citované 28-krát Súvisiace články Všetky verzie 3 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Star-gate: Teaching language models to ask clarifying questions

C Andukuri, JP Fränken, T Gerstenberg… - arxiv preprint arxiv …, 2024 - arxiv.org

When prompting language models to complete a task, users often leave important aspects
unsaid. While asking questions could resolve this ambiguity (GATE; Li et al., 2023), models …

Uložiť Citovať Citované 25-krát Súvisiace články Všetky verzie 4 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Generative reward models

D Mahan, D Van Phung, R Rafailov, C Blagden… - arxiv preprint arxiv …, 2024 - arxiv.org

Reinforcement Learning from Human Feedback (RLHF) has greatly improved the
performance of modern Large Language Models (LLMs). The RLHF process is resource …

Uložiť Citovať Citované 12-krát Súvisiace články Všetky verzie 4 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

PERSONA: A Reproducible Testbed for Pluralistic Alignment

L Castricato, N Lile, R Rafailov, JP Fränken… - arxiv preprint arxiv …, 2024 - arxiv.org

The rapid advancement of language models (LMs) necessitates robust alignment with
diverse user values. However, current preference optimization approaches often fail to …

Uložiť Citovať Citované 2-krát Súvisiace články Všetky verzie 3 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Aligning large language models via self-steering optimization

H **ang, B Yu, H Lin, K Lu, Y Lu, X Han, L Sun… - arxiv preprint arxiv …, 2024 - arxiv.org

Automated alignment develops alignment systems with minimal human intervention. The
key to automated alignment lies in providing learnable and accurate preference signals for …

Uložiť Citovať Citované 1-krát Súvisiace články Všetky verzie 4 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Is Free Self-Alignment Possible?

D Adila, C Shin, Y Zhang, F Sala - arxiv preprint arxiv:2406.03642, 2024 - arxiv.org

Aligning pretrained language models (LMs) is a complex and resource-intensive process,
often requiring access to large amounts of ground-truth preference data and substantial …

Uložiť Citovať Citované 1-krát Súvisiace články Všetky verzie 3 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

LLM Safety Alignment is Divergence Estimation in Disguise

R Haldar, Z Wang, Q Song, G Lin, Y **ng - arxiv preprint arxiv:2502.00657, 2025 - arxiv.org

We propose a theoretical framework demonstrating that popular Large Language Model
(LLM) alignment methods, including Reinforcement Learning from Human Feedback (RLHF) …

Uložiť Citovať Súvisiace články Všetky verzie 2 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Can Language Models Safeguard Themselves, Instantly and For Free?

D Adila, C Shin, Y Zhang, F Sala - ICML 2024 Next Generation of AI Safety … - openreview.net

Aligning pretrained language models (LMs) to handle a new safety scenario is normally
difficult and expensive, often requiring access to large amounts of ground-truth preference …

Uložiť Citovať Súvisiace články HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] synthlabs.ai

[PDF][PDF] Generative Reward Models-A Unified Approach to RLHF and RLAIF

D Mahan, D Van Phung, R Rafailov, CBNLL Castricato - static.synthlabs.ai

Reinforcement Learning from Human Feedback (RLHF) has greatly improved the
performance of modern Large Language Models (LLMs). The RLHF process is resource …

Uložiť Citovať Súvisiace články HTML verzia

Vytvoriť upozornenie

Citovať

Rozšírené vyhľadávanie

Uložené do mojej knižnice

Self-supervised alignment with mutual information: Learning to follow principles without...

Enhancing large vision language models with self-training on image comprehension

Star-gate: Teaching language models to ask clarifying questions

Generative reward models

PERSONA: A Reproducible Testbed for Pluralistic Alignment

Aligning large language models via self-steering optimization

Is Free Self-Alignment Possible?

LLM Safety Alignment is Divergence Estimation in Disguise

Can Language Models Safeguard Themselves, Instantly and For Free?

[PDF][PDF] Generative Reward Models-A Unified Approach to RLHF and RLAIF