[PDF][PDF] Trustllm: Trustworthiness in large language models

L Sun, Y Huang, H Wang, S Wu, Q Zhang… - arxiv preprint arxiv …, 2024 - mosis.eecs.utk.edu
Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …

[HTML][HTML] AI deception: A survey of examples, risks, and potential solutions

PS Park, S Goldstein, A O'Gara, M Chen, D Hendrycks - Patterns, 2024 - cell.com
This paper argues that a range of current AI systems have learned how to deceive humans.
We define deception as the systematic inducement of false beliefs in the pursuit of some …