Google Наука

J Zhang, H Bu, H Wen, Y Liu, H Fei… - …, 2025 - cybersecurity.springeropen.com

The rapid development of large language models (LLMs) has opened new avenues across
various fields, including cybersecurity, which faces an evolving threat landscape and …

Запазване Позоваване С позовавания в 40 Сродни статии Всички 4 версии Кеширана версия

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Safetyprompts: a systematic review of open datasets for evaluating and improving large language model safety

P Röttger, F Pernisi, B Vidgen, D Hovy - arxiv preprint arxiv:2404.05399, 2024 - arxiv.org

The last two years have seen a rapid growth in concerns around the safety of large
language models (LLMs). Researchers and practitioners have met these concerns by …

Запазване Позоваване С позовавания в 21 Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Detectors for safe and reliable llms: Implementations, uses, and limitations

S Achintalwar, AA Garcia, A Anaby-Tavor… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output
to biased and toxic generations. Due to several limiting factors surrounding LLMs (training …

Запазване Позоваване С позовавания в 16 Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI

A Rawat, S Schoepf, G Zizzo, G Cornacchia… - arxiv preprint arxiv …, 2024 - arxiv.org

As generative AI, particularly large language models (LLMs), become increasingly
integrated into production applications, new attack surfaces and vulnerabilities emerge and …

Запазване Позоваване С позовавания в 3 Сродни статии Всички 4 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Decolonial AI Alignment: Openness, Viśesa-Dharma, and Including Excluded Knowledges

KR Varshney - Proceedings of the AAAI/ACM Conference on AI, Ethics …, 2024 - ojs.aaai.org

Prior work has explicated the coloniality of artificial intelligence (AI) development and
deployment through mechanisms such as extractivism, automation, sociological …

Запазване Позоваване С позовавания в 4 Сродни статии Всички 7 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Alignment studio: Aligning large language models to particular contextual regulations

S Achintalwar, I Baldini, D Bouneffouf… - IEEE Internet …, 2024 - ieeexplore.ieee.org

The alignment of large language models is usually done by model providers to add or
control behaviors that are common or universally understood across use cases and …

Запазване Позоваване С позовавания в 3 Сродни статии Всички 8 версии

DARE to Diversify: DAta Driven and Diverse LLM REd Teaming

M Nagireddy, B Guillén Pegueroles… - Proceedings of the 30th …, 2024 - dl.acm.org

Large language models (LLMs) have been rapidly adopted, as showcased by ChatGPT's
overnight popularity, and are integrated in products used by millions of people every day …

Запазване Позоваване С позовавания в 2 Сродни статии Всички 3 версии

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dynamic normativity: Necessary and sufficient conditions for value alignment

NK Corrêa - arxiv preprint arxiv:2406.11039, 2024 - arxiv.org

The critical inquiry pervading the realm of Philosophy, and perhaps extending its influence
across all Humanities disciplines, revolves around the intricacies of morality and normativity …

Запазване Позоваване С позовавания в 2 Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs

G Zizzo, G Cornacchia, K Fraser, MZ Hameed… - arxiv preprint arxiv …, 2025 - arxiv.org

As large language models (LLMs) become integrated into everyday applications, ensuring
their robustness and security is increasingly critical. In particular, LLMs can be manipulated …

Запазване Позоваване Сродни статии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective

Y Huang, C Gao, S Wu, H Wang, X Wang… - arxiv preprint arxiv …, 2025 - arxiv.org

Generative Foundation Models (GenFMs) have emerged as transformative tools. However,
their widespread adoption raises critical concerns regarding trustworthiness across …

Запазване Позоваване Сродни статии Всички 2 версии Във вид на HTML

Създаване на сигнал

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

Unveiling safety vulnerabilities of large language models

[HTML][HTML] When llms meet cybersecurity: A systematic literature review

Safetyprompts: a systematic review of open datasets for evaluating and improving large language model safety

Detectors for safe and reliable llms: Implementations, uses, and limitations

Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI

Decolonial AI Alignment: Openness, Viśesa-Dharma, and Including Excluded Knowledges

Alignment studio: Aligning large language models to particular contextual regulations

DARE to Diversify: DAta Driven and Diverse LLM REd Teaming

Dynamic normativity: Necessary and sufficient conditions for value alignment

Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs

On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective