Študovňa Google

P Hager, F Jungmann, R Holland, K Bhagat… - Nature medicine, 2024 - nature.com

Clinical decision-making is one of the most impactful parts of a physician's responsibilities
and stands to benefit greatly from artificial intelligence solutions and large language models …

Uložiť Citovať Citované 108-krát Súvisiace články Všetky verzie 8

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Rlaif: Scaling reinforcement learning from human feedback with ai feedback

H Lee, S Phatale, H Mansoor, KR Lu, T Mesnard… - 2023 - openreview.net

Reinforcement learning from human feedback (RLHF) is an effective technique for aligning
large language models (LLMs) to human preferences, but gathering high-quality human …

Uložiť Citovať Citované 488-krát Súvisiace články Všetky verzie 4 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Can generalist foundation models outcompete special-purpose tuning? case study in medicine

H Nori, YT Lee, S Zhang, D Carignan, R Edgar… - arxiv preprint arxiv …, 2023 - arxiv.org

Generalist foundation models such as GPT-4 have displayed surprising capabilities in a
wide variety of domains and tasks. Yet, there is a prevalent assumption that they cannot …

Uložiť Citovať Citované 317-krát Súvisiace články Všetky verzie 6 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Evaluating large language models at evaluating instruction following

Z Zeng, J Yu, T Gao, Y Meng, T Goyal… - arxiv preprint arxiv …, 2023 - arxiv.org

As research in large language models (LLMs) continues to accelerate, LLM-based
evaluation has emerged as a scalable and cost-effective alternative to human evaluations …

Uložiť Citovať Citované 137-krát Súvisiace články Všetky verzie 5 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

Do llms exhibit human-like response biases? a case study in survey design

L Tjuatja, V Chen, T Wu, A Talwalkwar… - Transactions of the …, 2024 - direct.mit.edu

One widely cited barrier to the adoption of LLMs as proxies for humans in subjective tasks is
their sensitivity to prompt wording—but interestingly, humans also display sensitivities to …

Uložiť Citovať Citované 61-krát Súvisiace články Všetky verzie 8

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Preference learning algorithms do not learn preference rankings

A Chen, S Malladi, L Zhang, X Chen… - Advances in …, 2025 - proceedings.neurips.cc

Preference learning algorithms (eg, RLHF and DPO) are frequently used to steer LLMs to
produce generations that are more preferred by humans, but our understanding of their …

Uložiť Citovať Citované 16-krát Súvisiace články Všetky verzie 4 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Rlaif vs. rlhf: Scaling reinforcement learning from human feedback with ai feedback

H Lee, S Phatale, H Mansoor, T Mesnard… - arxiv preprint arxiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) has proven effective in aligning large
language models (LLMs) with human preferences, but gathering high-quality preference …

Uložiť Citovať Citované 64-krát Súvisiace články Všetky verzie 6 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] amazonaws.com

[PDF][PDF] The prompt report: A systematic survey of prompting techniques

S Schulhoff, M Ilie, N Balepur… - arxiv preprint …, 2024 - readwise-assets.s3.amazonaws.com

Abstract Generative Artificial Intelligence (GenAI) systems are being increasingly deployed
across all parts of industry and research settings. Developers and end users interact with …

Uložiť Citovať Citované 78-krát Súvisiace články Všetky verzie 5 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Introducing v0. 5 of the ai safety benchmark from mlcommons

B Vidgen, A Agrawal, AM Ahmed, V Akinwande… - arxiv preprint arxiv …, 2024 - arxiv.org

This paper introduces v0. 5 of the AI Safety Benchmark, which has been created by the
MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to …

Uložiť Citovať Citované 35-krát Súvisiace články Všetky verzie 6 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on stability of learning with limited labelled data and its sensitivity to the effects of randomness

B Pecher, I Srba, M Bielikova - ACM Computing Surveys, 2024 - dl.acm.org

Learning with limited labelled data, such as prompting, in-context learning, fine-tuning, meta-
learning, or few-shot learning, aims to effectively train a model using only a small amount of …

Uložiť Citovať Citované 6-krát Súvisiace články Všetky verzie 3

Vytvoriť upozornenie

Citovať

Rozšírené vyhľadávanie

Uložené do mojej knižnice

Large language models sensitivity to the order of options in multiple-choice questions

Evaluation and mitigation of the limitations of large language models in clinical decision-making

Rlaif: Scaling reinforcement learning from human feedback with ai feedback

Can generalist foundation models outcompete special-purpose tuning? case study in medicine

Evaluating large language models at evaluating instruction following

Do llms exhibit human-like response biases? a case study in survey design

Preference learning algorithms do not learn preference rankings

Rlaif vs. rlhf: Scaling reinforcement learning from human feedback with ai feedback

[PDF][PDF] The prompt report: A systematic survey of prompting techniques

Introducing v0. 5 of the ai safety benchmark from mlcommons

A survey on stability of learning with limited labelled data and its sensitivity to the effects of randomness