Machine learning for synthetic data generation: a review

Y Lu, M Shen, H Wang, X Wang, C van Rechem… - ar** review of privacy and utility metrics in medical synthetic data
B Kaabachi, J Despraz, T Meurers, K Otte… - NPJ digital …, 2025 - nature.com
The use of synthetic data is a promising solution to facilitate the sharing and reuse of health-
related data beyond its initial collection while addressing privacy concerns. However, there …

Private synthetic data for multitask learning and marginal queries

G Vietri, C Archambeau, S Aydore… - Advances in …, 2022 - proceedings.neurips.cc
We provide a differentially private algorithm for producing synthetic data simultaneously
useful for multiple tasks: marginal queries and multitask machine learning (ML). A key …

Generating private synthetic data with genetic algorithms

T Liu, J Tang, G Vietri, S Wu - International Conference on …, 2023 - proceedings.mlr.press
We study the problem of efficiently generating differentially private synthetic data that
approximate the statistical properties of an underlying sensitive dataset. In recent years …

Post-processing private synthetic data for improving utility on selected measures

H Wang, S Sudalairaj, J Henning… - Advances in …, 2024 - proceedings.neurips.cc
Existing private synthetic data generation algorithms are agnostic to downstream tasks.
However, end users may have specific requirements that the synthetic data must satisfy …

Graphical vs. Deep Generative Models: Measuring the Impact of Differentially Private Mechanisms and Budgets on Utility

G Ganev, K Xu, E De Cristofaro - Proceedings of the 2024 on ACM …, 2024 - dl.acm.org
Generative models trained with Differential Privacy (DP) can produce synthetic data while
reducing privacy risks. However, navigating their privacy-utility tradeoffs makes finding the …

An optimal and scalable matrix mechanism for noisy marginals under convex loss functions

Y **ao, G He, D Zhang, D Kifer - Advances in Neural …, 2024 - proceedings.neurips.cc
Noisy marginals are a common form of confidentiality-protecting data release and are useful
for many downstream tasks such as contingency table analysis, construction of Bayesian …

Towards principled assessment of tabular data synthesis algorithms

Y Du, N Li - arxiv preprint arxiv:2402.06806, 2024 - arxiv.org
Data synthesis has been advocated as an important approach for utilizing data while
protecting data privacy. A large number of tabular data synthesis algorithms (which we call …