Google Tudós

C Chen, B Huang, Z Li, Z Chen, S Lai, X Xu… - arxiv preprint arxiv …, 2024 - arxiv.org

Knowledge editing has been increasingly adopted to correct the false or outdated
knowledge in Large Language Models (LLMs). Meanwhile, one critical but under-explored …

Mentés Hivatkozás Idézetek száma: 14 Kapcsolódó cikkek Mind a(z) 5 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

An Overview of Trustworthy AI: Advances in IP Protection, Privacy-preserving Federated Learning, Security Verification, and GAI Safety Alignment

Y Zheng, CH Chang, SH Huang… - IEEE Journal on …, 2024 - ieeexplore.ieee.org

AI has undergone a remarkable evolution journey marked by groundbreaking milestones.
Like any powerful tool, it can be turned into a weapon for devastation in the wrong hands …

Mentés Hivatkozás Idézetek száma: 2 Kapcsolódó cikkek Mind a(z) 3 változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

On evaluating the durability of safeguards for open-weight llms

X Qi, B Wei, N Carlini, Y Huang, T **e, L He… - arxiv preprint arxiv …, 2024 - arxiv.org

Stakeholders--from model developers to policymakers--seek to minimize the dual-use risks
of large language models (LLMs). An open challenge to this goal is whether technical …

Mentés Hivatkozás Idézetek száma: 5 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Differentially private kernel density estimation

E Liu, JYC Hu, A Reneau, Z Song, H Liu - arxiv preprint arxiv:2409.01688, 2024 - arxiv.org

We introduce a refined differentially private (DP) data structure for kernel density estimation
(KDE), offering not only improved privacy-utility tradeoff but also better efficiency over prior …

Mentés Hivatkozás Idézetek száma: 4 Kapcsolódó cikkek Mind a(z) 3 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A Realistic Threat Model for Large Language Model Jailbreaks

V Boreiko, A Panfilov, V Voracek, M Hein… - arxiv preprint arxiv …, 2024 - arxiv.org

A plethora of jailbreaking attacks have been proposed to obtain harmful responses from
safety-tuned LLMs. In their original settings, these methods all largely succeed in coercing …

Mentés Hivatkozás Idézetek száma: 1 Kapcsolódó cikkek Mind a(z) 3 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Data-Aware Training Quality Monitoring and Certification for Reliable Deep Learning

F Yeganegi, A Eamaz, M Soltanalian - arxiv preprint arxiv:2410.10984, 2024 - arxiv.org

Deep learning models excel at capturing complex representations through sequential layers
of linear and non-linear transformations, yet their inherent black-box nature and multi-modal …

Mentés Hivatkozás Idézetek száma: 1 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

AEIOU: A Unified Defense Framework against NSFW Prompts in Text-to-Image Models

Y Wang, J Chen, Q Li, X Yang, S Ji - arxiv preprint arxiv:2412.18123, 2024 - arxiv.org

As text-to-image (T2I) models continue to advance and gain widespread adoption, their
associated safety issues are becoming increasingly prominent. Malicious users often exploit …

Mentés Hivatkozás Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Position: We Need An Adaptive Interpretation of Helpful, Honest, and Harmless Principles

Y Huang, C Gao, Y Zhou, K Guo, X Wang… - arxiv preprint arxiv …, 2025 - arxiv.org

The Helpful, Honest, and Harmless (HHH) principle is a foundational framework for aligning
AI systems with human values. However, existing interpretations of the HHH principle often …

Mentés Hivatkozás Kapcsolódó cikkek HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

SoK: Unifying Cybersecurity and Cybersafety of Multimodal Foundation Models with an Information Theory Approach

R Sun, J Chang, H Pearce, C **ao, B Li, Q Wu… - arxiv preprint arxiv …, 2024 - arxiv.org

Multimodal foundation models (MFMs) represent a significant advancement in artificial
intelligence, combining diverse data modalities to enhance learning and understanding …

Mentés Hivatkozás Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond

S Han - arxiv preprint arxiv:2410.18114, 2024 - arxiv.org

The advancements in generative AI inevitably raise concerns about their risks and safety
implications, which, in return, catalyzes significant progress in AI safety. However, as this …

Mentés Hivatkozás Kapcsolódó cikkek Mind a(z) 4 változat HTML-változat

Értesítés létrehozása

Hivatkozás

Speciális keresés

Mentve a Saját könyvtárba

AI Risk Management Should Incorporate Both Safety and Security

Can Editing LLMs Inject Harm?

An Overview of Trustworthy AI: Advances in IP Protection, Privacy-preserving Federated Learning, Security Verification, and GAI Safety Alignment

On evaluating the durability of safeguards for open-weight llms

Differentially private kernel density estimation

A Realistic Threat Model for Large Language Model Jailbreaks

Data-Aware Training Quality Monitoring and Certification for Reliable Deep Learning

AEIOU: A Unified Defense Framework against NSFW Prompts in Text-to-Image Models

Position: We Need An Adaptive Interpretation of Helpful, Honest, and Harmless Principles

SoK: Unifying Cybersecurity and Cybersafety of Multimodal Foundation Models with an Information Theory Approach

Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond