Synthetic data–anonymisation groundhog day

T Stadler, B Oprisanu, C Troncoso - 31st USENIX Security Symposium …, 2022 - usenix.org
Synthetic data has been advertised as a silver-bullet solution to privacy-preserving data
publishing that addresses the shortcomings of traditional anonymisation techniques. The …

Benchmarking differentially private synthetic data generation algorithms

Y Tao, R McKenna, M Hay, A Machanavajjhala… - arxiv preprint arxiv …, 2021 - arxiv.org
This work presents a systematic benchmark of differentially private synthetic data generation
algorithms that can generate tabular data. Utility of the synthetic data is evaluated by …

Winning the NIST Contest: A scalable and general approach to differentially private synthetic data

R McKenna, G Miklau, D Sheldon - arxiv preprint arxiv:2108.04978, 2021 - arxiv.org
We propose a general approach for differentially private synthetic data generation, that
consists of three steps:(1) select a collection of low-dimensional marginals,(2) measure …

[HTML][HTML] Can I trust my fake data–A comprehensive quality assessment framework for synthetic tabular data in healthcare

VB Vallevik, A Babic, SE Marshall, E Severin… - International Journal of …, 2024 - Elsevier
Background Ensuring safe adoption of AI tools in healthcare hinges on access to sufficient
data for training, testing and validation. Synthetic data has been suggested in response to …

Differentially private synthetic data: Applied evaluations and enhancements

L Rosenblatt, X Liu, S Pouyanfar, E de Leon… - arxiv preprint arxiv …, 2020 - arxiv.org
Machine learning practitioners frequently seek to leverage the most informative available
data, without violating the data owner's privacy, when building predictive models …

Synthetic data for privacy-preserving clinical risk prediction

Z Qian, T Callender, B Cebere, SM Janes, N Navani… - Scientific Reports, 2024 - nature.com
Synthetic data promise privacy-preserving data sharing for healthcare research and
development. Compared with other privacy-enhancing approaches—such as federated …

Comparative study of differentially private synthetic data algorithms from the NIST PSCR differential privacy synthetic data challenge

CMK Bowen, J Snoke - arxiv preprint arxiv:1911.12704, 2019 - arxiv.org
Differentially private synthetic data generation offers a recent solution to release analytically
useful data while preserving the privacy of individuals in the data. In order to utilize these …

Graphical vs. Deep Generative Models: Measuring the Impact of Differentially Private Mechanisms and Budgets on Utility

G Ganev, K Xu, E De Cristofaro - Proceedings of the 2024 on ACM …, 2024 - dl.acm.org
Generative models trained with Differential Privacy (DP) can produce synthetic data while
reducing privacy risks. However, navigating their privacy-utility tradeoffs makes finding the …

Towards principled assessment of tabular data synthesis algorithms

Y Du, N Li - arxiv preprint arxiv:2402.06806, 2024 - arxiv.org
Data synthesis has been advocated as an important approach for utilizing data while
protecting data privacy. A large number of tabular data synthesis algorithms (which we call …

30 years of synthetic data

J Drechsler, AC Haensch - Statistical Science, 2024 - projecteuclid.org
The idea to generate synthetic data as a tool for broadening access to sensitive microdata
has been proposed for the first time three decades ago. While first applications of the idea …