What is your data worth to gpt? llm-scale data valuation with influence functions

SK Choe, H Ahn, J Bae, K Zhao, M Kang… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) are trained on a vast amount of human-written data, but data
providers often remain uncredited. In response to this issue, data valuation (or data …

Data Acquisition for Improving Model Confidence

Y Li, X Yu, N Koudas - Proceedings of the ACM on Management of Data, 2024 - dl.acm.org
In recent years, there has been a growing recognition that high-quality training data is
crucial for the performance of machine learning models. This awareness has catalyzed both …

A Data-Centric Online Market for Machine Learning: From Discovery to Pricing

M Han, J Light, S **a, S Galhotra… - arxiv preprint arxiv …, 2023 - arxiv.org
Data fuels machine learning (ML)-rich and high-quality training data is essential to the
success of ML. However, to transform ML from the race among a few large corporations to …

A Survey on Data Markets

J Zhang, Y Bi, M Cheng, J Liu, K Ren, Q Sun… - arxiv preprint arxiv …, 2024 - arxiv.org
Data is the new oil of the 21st century. The growing trend of trading data for greater welfare
has led to the emergence of data markets. A data market is any mechanism whereby the …

Ali-dpfl: Differentially private federated learning with adaptive local iterations

X Ling, J Fu, K Wang, H Liu… - 2024 IEEE 25th …, 2024 - ieeexplore.ieee.org
Federated Learning (FL) is a distributed machine learning technique that allows model
training among multiple devices or organizations by sharing training parameters instead of …

2D-OOB: Attributing Data Contribution Through Joint Valuation Framework

Y Sun, J Shen, Y Kwon - arxiv preprint arxiv:2408.03572, 2024 - arxiv.org
Data valuation has emerged as a powerful framework for quantifying each datum's
contribution to the training of a machine learning model. However, it is crucial to recognize …

DAVED: Data Acquisition via Experimental Design for Data Markets

C Lu, B Huang, SP Karimireddy, P Vepakomma… - arxiv preprint arxiv …, 2024 - arxiv.org
The acquisition of training data is crucial for machine learning applications. Data markets
can increase the supply of data, particularly in data-scarce domains such as healthcare, by …

[PDF][PDF] Understanding Training Data in Large-Scale Machine Learning

SK Choe - 2024 - kilthub.cmu.edu
As the capabilities of large-scale machine learning (ML) systems rapidly improve, reliable
development & deployment of these systems is increasingly gaining attention. Based on the …

Data Acquisition via Experimental Design for Data Markets

C Lu, B Huang, SP Karimireddy, P Vepakomma… - The Thirty-eighth Annual … - openreview.net
The acquisition of training data is crucial for machine learning applications. Data markets
can increase the supply of data, particularly in data-scarce domains such as healthcare, by …