Google 학술 검색

J Zhang, H Bu, H Wen, Y Liu, H Fei… - …, 2025 - cybersecurity.springeropen.com

The rapid development of large language models (LLMs) has opened new avenues across
various fields, including cybersecurity, which faces an evolving threat landscape and …

저장 인용 37회 인용 관련 학술자료 전체 2개의 버전 저장된 페이지

[Free GPT-4]

[PDF] arxiv.org

Threats, attacks, and defenses in machine unlearning: A survey

Z Liu, H Ye, C Chen, Y Zheng, KY Lam - arxiv preprint arxiv:2403.13682, 2024 - arxiv.org

Machine Unlearning (MU) has recently gained considerable attention due to its potential to
achieve Safe AI by removing the influence of specific data from trained Machine Learning …

저장 인용 23회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arxiv preprint arxiv …, 2024 - arxiv.org

This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

저장 인용 120회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Rethinking machine unlearning for large language models

S Liu, Y Yao, J Jia, S Casper, N Baracaldo… - arxiv preprint arxiv …, 2024 - arxiv.org

We explore machine unlearning (MU) in the domain of large language models (LLMs),
referred to as LLM unlearning. This initiative aims to eliminate undesirable data influence …

저장 인용 98회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Muse: Machine unlearning six-way evaluation for language models

W Shi, J Lee, Y Huang, S Malladi, J Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org

Language models (LMs) are trained on vast amounts of text data, which may include private
and copyrighted content. Data owners may request the removal of their data from a trained …

저장 인용 28회 인용 관련 학술자료 전체 4개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Guardrail baselines for unlearning in llms

P Thaker, Y Maurya, S Hu, ZS Wu, V Smith - arxiv preprint arxiv …, 2024 - arxiv.org

Recent work has demonstrated that finetuning is a promising approach to'unlearn'concepts
from large language models. However, finetuning can be expensive, as it requires both …

저장 인용 22회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] jair.org

Against The Achilles' Heel: A Survey on Red Teaming for Generative Models

L Lin, H Mu, Z Zhai, M Wang, Y Wang, R Wang… - Journal of Artificial …, 2025 - jair.org

Generative models are rapidly gaining popularity and being integrated into everyday
applications, raising concerns over their safe use as various vulnerabilities are exposed. In …

저장 인용 12회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Open problems in technical ai governance

A Reuel, B Bucknall, S Casper, T Fist, L Soder… - arxiv preprint arxiv …, 2024 - arxiv.org

AI progress is creating a growing range of risks and opportunities, but it is often unclear how
they should be navigated. In many cases, the barriers and uncertainties faced are at least …

저장 인용 22회 인용 관련 학술자료 전체 4개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Machine unlearning in generative ai: A survey

Z Liu, G Dou, Z Tan, Y Tian, M Jiang - arxiv preprint arxiv:2407.20516, 2024 - arxiv.org

Generative AI technologies have been deployed in many places, such as (multimodal) large
language models and vision generative models. Their remarkable performance should be …

저장 인용 14회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

International Scientific Report on the Safety of Advanced AI (Interim Report)

Y Bengio, S Mindermann, D Privitera… - arxiv preprint arxiv …, 2024 - arxiv.org

This is the interim publication of the first International Scientific Report on the Safety of
Advanced AI. The report synthesises the scientific understanding of general-purpose AI--AI …

저장 인용 12회 인용 관련 학술자료 전체 5개의 버전 HTML 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

The wmdp benchmark: Measuring and reducing malicious use with unlearning

[HTML][HTML] When llms meet cybersecurity: A systematic literature review

Threats, attacks, and defenses in machine unlearning: A survey

Foundational challenges in assuring alignment and safety of large language models

Rethinking machine unlearning for large language models

Muse: Machine unlearning six-way evaluation for language models

Guardrail baselines for unlearning in llms

Against The Achilles' Heel: A Survey on Red Teaming for Generative Models

Open problems in technical ai governance

Machine unlearning in generative ai: A survey

International Scientific Report on the Safety of Advanced AI (Interim Report)