- Academic Search

C Chen, K Shu - AI Magazine, 2024 - Wiley Online Library

Misinformation such as fake news and rumors is a serious threat for information ecosystems
and public trust. The emergence of large language models (LLMs) has great potential to …

保存引用被引用数: 125 関連記事全 4 バージョン

[Free GPT-4]

[PDF] arxiv.org

Survey of vulnerabilities in large language models revealed by adversarial attacks

E Shayegani, MAA Mamun, Y Fu, P Zaree… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language Models (LLMs) are swiftly advancing in architecture and capability, and as
they integrate more deeply into complex systems, the urgency to scrutinize their security …

保存引用被引用数: 129 関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Trustllm: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu, Q Zhang, Y Li… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …

保存引用被引用数: 243 関連記事全 4 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Jailbreak and guard aligned language models with only few in-context demonstrations

Z Wei, Y Wang, A Li, Y Mo, Y Wang - arxiv preprint arxiv:2310.06387, 2023 - arxiv.org

Large Language Models (LLMs) have shown remarkable success in various tasks, yet their
safety and the risk of generating harmful content remain pressing concerns. In this paper, we …

保存引用被引用数: 174 関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[HTML] mlr.press

[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu… - International …, 2024 - proceedings.mlr.press

Large language models (LLMs) have gained considerable attention for their excellent
natural language processing capabilities. Nonetheless, these LLMs present many …

保存引用被引用数: 39 関連記事キャッシュ

[Free GPT-4]

[PDF] arxiv.org

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arxiv preprint arxiv …, 2024 - arxiv.org

This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

保存引用被引用数: 116 関連記事全 3 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Autodan: Generating stealthy jailbreak prompts on aligned large language models

X Liu, N Xu, M Chen, C **ao - arxiv preprint arxiv:2310.04451, 2023 - arxiv.org

The aligned Large Language Models (LLMs) are powerful language understanding and
decision-making tools that are created through extensive alignment with human feedback …

保存引用被引用数: 354 関連記事全 3 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Multilingual jailbreak challenges in large language models

Y Deng, W Zhang, SJ Pan, L Bing - arxiv preprint arxiv:2310.06474, 2023 - arxiv.org

While large language models (LLMs) exhibit remarkable capabilities across a wide range of
tasks, they pose potential safety concerns, such as the``jailbreak''problem, wherein …

保存引用被引用数: 172 関連記事全 5 バージョン HTMLバージョン

[Free GPT-4]

[PDF] usenix.org

Making them ask and answer: Jailbreaking large language models in few queries via disguise and reconstruction

T Liu, Y Zhang, Z Zhao, Y Dong, G Meng… - 33rd USENIX Security …, 2024 - usenix.org

In recent years, large language models (LLMs) have demonstrated notable success across
various tasks, but the trustworthiness of LLMs is still an open problem. One specific threat is …

保存引用被引用数: 24 関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

How johnny can persuade llms to jailbreak them: Rethinking persuasion to challenge ai safety by humanizing llms

Y Zeng, H Lin, J Zhang, D Yang, R Jia… - arxiv preprint arxiv …, 2024 - arxiv.org

Most traditional AI safety research has approached AI models as machines and centered on
algorithm-focused attacks developed by security experts. As large language models (LLMs) …

保存引用被引用数: 179 関連記事全 3 バージョン HTMLバージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Gpt-4 is too smart to be safe: Stealthy chat with llms via cipher

Combating misinformation in the age of llms: Opportunities and challenges

Survey of vulnerabilities in large language models revealed by adversarial attacks

Trustllm: Trustworthiness in large language models

Jailbreak and guard aligned language models with only few in-context demonstrations

[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models

Foundational challenges in assuring alignment and safety of large language models

Autodan: Generating stealthy jailbreak prompts on aligned large language models

Multilingual jailbreak challenges in large language models

Making them ask and answer: Jailbreaking large language models in few queries via disguise and reconstruction

How johnny can persuade llms to jailbreak them: Rethinking persuasion to challenge ai safety by humanizing llms