Google Наука

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, so do risks from misalignment. To provide a comprehensive …

Запазване Позоваване С позовавания в 249 Сродни статии Всички 4 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Automatically correcting large language models: Surveying the landscape of diverse self-correction strategies

L Pan, M Saxon, W Xu, D Nathani, X Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models (LLMs) have demonstrated remarkable performance across a wide
array of NLP tasks. However, their efficacy is undermined by undesired and inconsistent …

Запазване Позоваване С позовавания в 197 Сродни статии Всички 2 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Beavertails: Towards improved safety alignment of llm via a human-preference dataset

J Ji, M Liu, J Dai, X Pan, C Zhang… - Advances in …, 2023 - proceedings.neurips.cc

In this paper, we introduce the BeaverTails dataset, aimed at fostering research on safety
alignment in large language models (LLMs). This dataset uniquely separates annotations of …

Запазване Позоваване С позовавания в 344 Сродни статии Всички 8 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Self-rag: Learning to retrieve, generate, and critique through self-reflection

A Asai, Z Wu, Y Wang, A Sil… - The Twelfth International …, 2023 - openreview.net

Despite their remarkable capabilities, large language models (LLMs) often produce
responses containing factual inaccuracies due to their sole reliance on the parametric …

Запазване Позоваване С позовавания в 414 Сродни статии Всички 5 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Open problems and fundamental limitations of reinforcement learning from human feedback

S Casper, X Davies, C Shi, TK Gilbert… - arxiv preprint arxiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

Запазване Позоваване С позовавания в 480 Сродни статии Всички 7 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Chain-of-verification reduces hallucination in large language models

S Dhuliawala, M Komeili, J Xu, R Raileanu, X Li… - arxiv preprint arxiv …, 2023 - arxiv.org

Generation of plausible yet incorrect factual information, termed hallucination, is an
unsolved issue in large language models. We study the ability of language models to …

Запазване Позоваване С позовавания в 335 Сродни статии Всички 6 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback

T Yu, Y Yao, H Zhang, T He, Y Han… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Multimodal Large Language Models (MLLMs) have recently demonstrated
impressive capabilities in multimodal understanding reasoning and interaction. However …

Запазване Позоваване С позовавания в 181 Сродни статии Всички 7 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Safe rlhf: Safe reinforcement learning from human feedback

J Dai, X Pan, R Sun, J Ji, X Xu, M Liu, Y Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

With the development of large language models (LLMs), striking a balance between the
performance and safety of AI systems has never been more critical. However, the inherent …

Запазване Позоваване С позовавания в 265 Сродни статии Всички 4 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Detecting and preventing hallucinations in large vision language models

A Gunjal, J Yin, E Bas - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org

Instruction tuned Large Vision Language Models (LVLMs) have significantly advanced in
generalizing across a diverse set of multi-modal tasks, especially for Visual Question …

Запазване Позоваване С позовавания в 201 Сродни статии Всички 6 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Preference ranking optimization for human alignment

F Song, B Yu, M Li, H Yu, F Huang, Y Li… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

Large language models (LLMs) often contain misleading content, emphasizing the need to
align them with human values to ensure secure AI systems. Reinforcement learning from …

Запазване Позоваване С позовавания в 219 Сродни статии Всички 5 версии Във вид на HTML

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

Ai alignment: A comprehensive survey

Automatically correcting large language models: Surveying the landscape of diverse self-correction strategies

Beavertails: Towards improved safety alignment of llm via a human-preference dataset

Self-rag: Learning to retrieve, generate, and critique through self-reflection

Open problems and fundamental limitations of reinforcement learning from human feedback

Chain-of-verification reduces hallucination in large language models

Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback

Safe rlhf: Safe reinforcement learning from human feedback

Detecting and preventing hallucinations in large vision language models

Preference ranking optimization for human alignment