Combating misinformation in the age of llms: Opportunities and challenges

C Chen, K Shu - AI Magazine, 2024 - Wiley Online Library
Misinformation such as fake news and rumors is a serious threat for information ecosystems
and public trust. The emergence of large language models (LLMs) has great potential to …

Survey of vulnerabilities in large language models revealed by adversarial attacks

E Shayegani, MAA Mamun, Y Fu, P Zaree… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) are swiftly advancing in architecture and capability, and as
they integrate more deeply into complex systems, the urgency to scrutinize their security …

Trustllm: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu, Q Zhang, Y Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …

Jailbreak and guard aligned language models with only few in-context demonstrations

Z Wei, Y Wang, A Li, Y Mo, Y Wang - arxiv preprint arxiv:2310.06387, 2023 - arxiv.org
Large Language Models (LLMs) have shown remarkable success in various tasks, yet their
safety and the risk of generating harmful content remain pressing concerns. In this paper, we …

[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu… - International …, 2024 - proceedings.mlr.press
Large language models (LLMs) have gained considerable attention for their excellent
natural language processing capabilities. Nonetheless, these LLMs present many …

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arxiv preprint arxiv …, 2024 - arxiv.org
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

Autodan: Generating stealthy jailbreak prompts on aligned large language models

X Liu, N Xu, M Chen, C **ao - arxiv preprint arxiv:2310.04451, 2023 - arxiv.org
The aligned Large Language Models (LLMs) are powerful language understanding and
decision-making tools that are created through extensive alignment with human feedback …

Multilingual jailbreak challenges in large language models

Y Deng, W Zhang, SJ Pan, L Bing - arxiv preprint arxiv:2310.06474, 2023 - arxiv.org
While large language models (LLMs) exhibit remarkable capabilities across a wide range of
tasks, they pose potential safety concerns, such as the``jailbreak''problem, wherein …

Making them ask and answer: Jailbreaking large language models in few queries via disguise and reconstruction

T Liu, Y Zhang, Z Zhao, Y Dong, G Meng… - 33rd USENIX Security …, 2024 - usenix.org
In recent years, large language models (LLMs) have demonstrated notable success across
various tasks, but the trustworthiness of LLMs is still an open problem. One specific threat is …

How johnny can persuade llms to jailbreak them: Rethinking persuasion to challenge ai safety by humanizing llms

Y Zeng, H Lin, J Zhang, D Yang, R Jia… - arxiv preprint arxiv …, 2024 - arxiv.org
Most traditional AI safety research has approached AI models as machines and centered on
algorithm-focused attacks developed by security experts. As large language models (LLMs) …