- Academic Search

Y Qin, S Hu, Y Lin, W Chen, N Ding, G Cui… - ACM Computing …, 2024 - dl.acm.org

Humans possess an extraordinary ability to create and utilize tools. With the advent of
foundation models, artificial intelligence systems have the potential to be equally adept in …

Zapisz Cytuj Cytowane przez 310 Powiązane artykuły Wszystkie wersje 10

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, so do risks from misalignment. To provide a comprehensive …

Zapisz Cytuj Cytowane przez 243 Powiązane artykuły Wszystkie wersje 4 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] nature.com

Detecting hallucinations in large language models using semantic entropy

S Farquhar, J Kossen, L Kuhn, Y Gal - Nature, 2024 - nature.com

Large language model (LLM) systems, such as ChatGPT or Gemini, can show impressive
reasoning and question-answering capabilities but often 'hallucinate'false outputs and …

Zapisz Cytuj Cytowane przez 184 Powiązane artykuły Wszystkie wersje 10

[Free GPT-4]
[DeepSeek]

[HTML] sciencedirect.com

[HTML][HTML] Connecting the dots in trustworthy Artificial Intelligence: From AI principles, ethics, and key requirements to responsible AI systems and regulation

N Díaz-Rodríguez, J Del Ser, M Coeckelbergh… - Information …, 2023 - Elsevier

Abstract Trustworthy Artificial Intelligence (AI) is based on seven technical requirements
sustained over three main pillars that should be met throughout the system's entire life cycle …

Zapisz Cytuj Cytowane przez 404 Powiązane artykuły Wszystkie wersje 8

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Open problems and fundamental limitations of reinforcement learning from human feedback

S Casper, X Davies, C Shi, TK Gilbert… - arxiv preprint arxiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

Zapisz Cytuj Cytowane przez 469 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Gpqa: A graduate-level google-proof q&a benchmark

D Rein, BL Hou, AC Stickland, J Petty… - First Conference on …, 2024 - openreview.net

We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain
experts in biology, physics, and chemistry. We ensure that the questions are high-quality …

Zapisz Cytuj Cytowane przez 251 Powiązane artykuły Wszystkie wersje 4 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Rlaif: Scaling reinforcement learning from human feedback with ai feedback

H Lee, S Phatale, H Mansoor, KR Lu, T Mesnard… - 2023 - openreview.net

Reinforcement learning from human feedback (RLHF) is an effective technique for aligning
large language models (LLMs) to human preferences, but gathering high-quality human …

Zapisz Cytuj Cytowane przez 477 Powiązane artykuły Wszystkie wersje 4 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Tree of attacks: Jailbreaking black-box llms automatically

A Mehrotra, M Zampetakis… - Advances in …, 2025 - proceedings.neurips.cc

Abstract While Large Language Models (LLMs) display versatile functionality, they continue
to generate harmful, biased, and toxic content, as demonstrated by the prevalence of human …

Zapisz Cytuj Cytowane przez 188 Powiązane artykuły Wszystkie wersje 6 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Guiding pretraining in reinforcement learning with large language models

Y Du, O Watkins, Z Wang, C Colas… - International …, 2023 - proceedings.mlr.press

Reinforcement learning algorithms typically struggle in the absence of a dense, well-shaped
reward function. Intrinsically motivated exploration methods address this limitation by …

Zapisz Cytuj Cytowane przez 211 Powiązane artykuły Wszystkie wersje 8 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Beyond the imitation game: Quantifying and extrapolating the capabilities of language models

A Srivastava, A Rastogi, A Rao, AAM Shoeb… - arxiv preprint arxiv …, 2022 - arxiv.org

Language models demonstrate both quantitative improvement and new qualitative
capabilities with increasing scale. Despite their potentially transformative impact, these new …

Zapisz Cytuj Cytowane przez 1345 Powiązane artykuły Wszystkie wersje 15 Wersja HTML

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

Concrete problems in AI safety

Tool learning with foundation models

Ai alignment: A comprehensive survey

Detecting hallucinations in large language models using semantic entropy

[HTML][HTML] Connecting the dots in trustworthy Artificial Intelligence: From AI principles, ethics, and key requirements to responsible AI systems and regulation

Open problems and fundamental limitations of reinforcement learning from human feedback

Gpqa: A graduate-level google-proof q&a benchmark

Rlaif: Scaling reinforcement learning from human feedback with ai feedback

Tree of attacks: Jailbreaking black-box llms automatically

Guiding pretraining in reinforcement learning with large language models

Beyond the imitation game: Quantifying and extrapolating the capabilities of language models