Combating misinformation in the age of llms: Opportunities and challenges
Misinformation such as fake news and rumors is a serious threat for information ecosystems
and public trust. The emergence of large language models (LLMs) has great potential to …
and public trust. The emergence of large language models (LLMs) has great potential to …
[HTML][HTML] AI deception: A survey of examples, risks, and potential solutions
This paper argues that a range of current AI systems have learned how to deceive humans.
We define deception as the systematic inducement of false beliefs in the pursuit of some …
We define deception as the systematic inducement of false beliefs in the pursuit of some …
Trustllm: Trustworthiness in large language models
Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …
attention for their excellent natural language processing capabilities. Nonetheless, these …
[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models
Large language models (LLMs) have gained considerable attention for their excellent
natural language processing capabilities. Nonetheless, these LLMs present many …
natural language processing capabilities. Nonetheless, these LLMs present many …
Ai alignment: A comprehensive survey
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …
Harms from increasingly agentic algorithmic systems
Research in Fairness, Accountability, Transparency, and Ethics (FATE) 1 has established
many sources and forms of algorithmic harm, in domains as diverse as health care, finance …
many sources and forms of algorithmic harm, in domains as diverse as health care, finance …
Building machines that learn and think with people
What do we want from machine intelligence? We envision machines that are not just tools
for thought but partners in thought: reasonable, insightful, knowledgeable, reliable and …
for thought but partners in thought: reasonable, insightful, knowledgeable, reliable and …
Large language model alignment: A survey
Recent years have witnessed remarkable progress made in large language models (LLMs).
Such advancements, while garnering significant attention, have concurrently elicited various …
Such advancements, while garnering significant attention, have concurrently elicited various …
A survey of reinforcement learning from human feedback
Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning
(RL) that learns from human feedback instead of relying on an engineered reward function …
(RL) that learns from human feedback instead of relying on an engineered reward function …
How to catch an ai liar: Lie detection in black-box llms by asking unrelated questions
Large language models (LLMs) can" lie", which we define as outputting false statements
despite" knowing" the truth in a demonstrable sense. LLMs might" lie", for example, when …
despite" knowing" the truth in a demonstrable sense. LLMs might" lie", for example, when …