- Academic Search

C Chen, K Shu - AI Magazine, 2024 - Wiley Online Library

Misinformation such as fake news and rumors is a serious threat for information ecosystems
and public trust. The emergence of large language models (LLMs) has great potential to …

Uložit Citovat Počet citací tohoto článku: 129 Související články Všechny verze (počet: 4)

[Free GPT-4]
[DeepSeek]

[HTML] cell.com Full View

[HTML][HTML] AI deception: A survey of examples, risks, and potential solutions

PS Park, S Goldstein, A O'Gara, M Chen, D Hendrycks - Patterns, 2024 - cell.com

This paper argues that a range of current AI systems have learned how to deceive humans.
We define deception as the systematic inducement of false beliefs in the pursuit of some …

Uložit Citovat Počet citací tohoto článku: 166 Související články Všechny verze (počet: 10)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Trustllm: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu, Q Zhang, Y Li… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …

Uložit Citovat Počet citací tohoto článku: 251 Související články Všechny verze (počet: 4) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[HTML] mlr.press

[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu… - International …, 2024 - proceedings.mlr.press

Large language models (LLMs) have gained considerable attention for their excellent
natural language processing capabilities. Nonetheless, these LLMs present many …

Uložit Citovat Počet citací tohoto článku: 46 Související články Archiv

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

Uložit Citovat Počet citací tohoto článku: 232 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Harms from increasingly agentic algorithmic systems

A Chan, R Salganik, A Markelius, C Pang… - Proceedings of the …, 2023 - dl.acm.org

Research in Fairness, Accountability, Transparency, and Ethics (FATE) 1 has established
many sources and forms of algorithmic harm, in domains as diverse as health care, finance …

Uložit Citovat Počet citací tohoto článku: 85 Související články Všechny verze (počet: 6)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Building machines that learn and think with people

KM Collins, I Sucholutsky, U Bhatt, K Chandra… - Nature human …, 2024 - nature.com

What do we want from machine intelligence? We envision machines that are not just tools
for thought but partners in thought: reasonable, insightful, knowledgeable, reliable and …

Uložit Citovat Počet citací tohoto článku: 17 Související články Všechny verze (počet: 10)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Large language model alignment: A survey

T Shen, R **, Y Huang, C Liu, W Dong, Z Guo… - arxiv preprint arxiv …, 2023 - arxiv.org

Recent years have witnessed remarkable progress made in large language models (LLMs).
Such advancements, while garnering significant attention, have concurrently elicited various …

Uložit Citovat Počet citací tohoto článku: 153 Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey of reinforcement learning from human feedback

T Kaufmann, P Weng, V Bengs… - arxiv preprint arxiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning
(RL) that learns from human feedback instead of relying on an engineered reward function …

Uložit Citovat Počet citací tohoto článku: 121 Související články Všechny verze (počet: 4) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

How to catch an ai liar: Lie detection in black-box llms by asking unrelated questions

L Pacchiardi, AJ Chan, S Mindermann… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models (LLMs) can" lie", which we define as outputting false statements
despite" knowing" the truth in a demonstrable sense. LLMs might" lie", for example, when …

Uložit Citovat Počet citací tohoto článku: 51 Související články Všechny verze (počet: 3) Zobrazit jako HTML

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

Characterizing manipulation from AI systems

Combating misinformation in the age of llms: Opportunities and challenges

[HTML][HTML] AI deception: A survey of examples, risks, and potential solutions

Trustllm: Trustworthiness in large language models

[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models

Ai alignment: A comprehensive survey

Harms from increasingly agentic algorithmic systems

Building machines that learn and think with people

Large language model alignment: A survey

A survey of reinforcement learning from human feedback

How to catch an ai liar: Lie detection in black-box llms by asking unrelated questions