Datainf: Efficiently estimating data influence in lora-tuned llms and diffusion models
Quantifying the impact of training data points is crucial for understanding the outputs of
machine learning models and for improving the transparency of the AI pipeline. The …
machine learning models and for improving the transparency of the AI pipeline. The …
A Survey on Data Markets
Data is the new oil of the 21st century. The growing trend of trading data for greater welfare
has led to the emergence of data markets. A data market is any mechanism whereby the …
has led to the emergence of data markets. A data market is any mechanism whereby the …
Training Data Attribution via Approximate Unrolling
Many training data attribution (TDA) methods aim to estimate how a model's behavior would
change if one or more data points were removed from the training set. Methods based on …
change if one or more data points were removed from the training set. Methods based on …
Training Data Attribution via Approximate Unrolled Differentation
Many training data attribution (TDA) methods aim to estimate how a model's behavior would
change if one or more data points were removed from the training set. Methods based on …
change if one or more data points were removed from the training set. Methods based on …
Data value estimation on private gradients
For gradient-based machine learning (ML) methods commonly adopted in practice such as
stochastic gradient descent, the de facto differential privacy (DP) technique is perturbing the …
stochastic gradient descent, the de facto differential privacy (DP) technique is perturbing the …
Data Overvaluation Attack and Truthful Data Valuation
In collaborative machine learning, data valuation, ie, evaluating the contribution of each
client'data to the machine learning model, has become a critical task for incentivizing and …
client'data to the machine learning model, has become a critical task for incentivizing and …