Muse: Machine unlearning six-way evaluation for language models

W Shi, J Lee, Y Huang, S Malladi, J Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org
Language models (LMs) are trained on vast amounts of text data, which may include private
and copyrighted content. Data owners may request the removal of their data from a trained …

Evaluating copyright takedown methods for language models

B Wei, W Shi, Y Huang, NA Smith, C Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Language models (LMs) derive their capabilities from extensive training on diverse data,
including potentially copyrighted material. These models can memorize and generate …

An economic solution to copyright challenges of generative ai

JT Wang, Z Deng, H Chiba-Okabe, B Barak… - arxiv preprint arxiv …, 2024 - arxiv.org
Generative artificial intelligence (AI) systems are trained on large data corpora to generate
new pieces of text, images, videos, and other media. There is growing concern that such …

On memorization of large language models in logical reasoning

C **e, Y Huang, C Zhang, D Yu, X Chen, BY Lin… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) achieve good performance on challenging reasoning
benchmarks, yet could also make basic reasoning mistakes. This contrasting behavior is …

CopyBench: Measuring literal and non-literal reproduction of copyright-protected text in language model generation

T Chen, A Asai, N Mireshghallah, S Min… - arxiv preprint arxiv …, 2024 - arxiv.org
Evaluating the degree of reproduction of copyright-protected content by language models
(LMs) is of significant interest to the AI and legal communities. Although both literal and non …

Tackling genai copyright issues: Originality estimation and genericization

H Chiba-Okabe, WJ Su - arxiv preprint arxiv:2406.03341, 2024 - arxiv.org
The rapid progress of generative AI technology has sparked significant copyright concerns,
leading to numerous lawsuits filed against AI developers. While various techniques for …

Unlearn and Burn: Adversarial Machine Unlearning Requests Destroy Model Accuracy

Y Huang, D Liu, L Chua, B Ghazi, P Kamath… - arxiv preprint arxiv …, 2024 - arxiv.org
Machine unlearning algorithms, designed for selective removal of training data from models,
have emerged as a promising approach to growing privacy concerns. In this work, we …

Provable unlearning in topic modeling and downstream tasks

S Wei, S Malladi, S Arora, A Sanyal - arxiv preprint arxiv:2411.12600, 2024 - arxiv.org
Machine unlearning algorithms are increasingly important as legal concerns arise around
the provenance of training data, but verifying the success of unlearning is often difficult …

Semantic to Structure: Learning Structural Representations for Infringement Detection

C Huang, Z Jia, H Fei, Y Zhu, Z Yuan, J Zhang… - arxiv preprint arxiv …, 2025 - arxiv.org
Structural information in images is crucial for aesthetic assessment, and it is widely
recognized in the artistic field that imitating the structure of other works significantly infringes …

The Pitfalls of" Security by Obscurity" And What They Mean for Transparent AI

P Hall, O Mundahl, S Park - arxiv preprint arxiv:2501.18669, 2025 - arxiv.org
Calls for transparency in AI systems are growing in number and urgency from diverse
stakeholders ranging from regulators to researchers to users (with a comparative absence of …