Datainf: Efficiently estimating data influence in lora-tuned llms and diffusion models

Y Kwon, E Wu, K Wu, J Zou - arxiv preprint arxiv:2310.00902, 2023 - arxiv.org
Quantifying the impact of training data points is crucial for understanding the outputs of
machine learning models and for improving the transparency of the AI pipeline. The …

A Survey on Data Markets

J Zhang, Y Bi, M Cheng, J Liu, K Ren, Q Sun… - arxiv preprint arxiv …, 2024 - arxiv.org
Data is the new oil of the 21st century. The growing trend of trading data for greater welfare
has led to the emergence of data markets. A data market is any mechanism whereby the …

Training Data Attribution via Approximate Unrolling

J Bae, W Lin, J Lorraine, RB Grosse - The Thirty-eighth Annual …, 2024 - openreview.net
Many training data attribution (TDA) methods aim to estimate how a model's behavior would
change if one or more data points were removed from the training set. Methods based on …

Training Data Attribution via Approximate Unrolled Differentation

J Bae, W Lin, J Lorraine, R Grosse - arxiv preprint arxiv:2405.12186, 2024 - arxiv.org
Many training data attribution (TDA) methods aim to estimate how a model's behavior would
change if one or more data points were removed from the training set. Methods based on …

Data value estimation on private gradients

Z Zhou, X Xu, D Rus, BKH Low - arxiv preprint arxiv:2412.17008, 2024 - arxiv.org
For gradient-based machine learning (ML) methods commonly adopted in practice such as
stochastic gradient descent, the de facto differential privacy (DP) technique is perturbing the …

Data Overvaluation Attack and Truthful Data Valuation

S Zheng, S Cai, C **ao, Y Cao, J Qin… - arxiv preprint arxiv …, 2025 - arxiv.org
In collaborative machine learning, data valuation, ie, evaluating the contribution of each
client'data to the machine learning model, has become a critical task for incentivizing and …