Machine learning for synthetic data generation: a review

Y Lu, M Shen, H Wang, X Wang, C van Rechem… - arxiv preprint arxiv …, 2023 - arxiv.org
Machine learning heavily relies on data, but real-world applications often encounter various
data-related issues. These include data of poor quality, insufficient data points leading to …

A causal perspective on dataset bias in machine learning for medical imaging

C Jones, DC Castro, F De Sousa Ribeiro… - Nature Machine …, 2024 - nature.com
As machine learning methods gain prominence within clinical decision-making, the need to
address fairness concerns becomes increasingly urgent. Despite considerable work …

Dcface: Synthetic face generation with dual condition diffusion model

M Kim, F Liu, A Jain, X Liu - … of the ieee/cvf conference on …, 2023 - openaccess.thecvf.com
Generating synthetic datasets for training face recognition models is challenging because
dataset generation entails more than creating high fidelity images. It involves generating …

Synthetic Data--what, why and how?

J Jordon, L Szpruch, F Houssiau, M Bottarelli… - arxiv preprint arxiv …, 2022 - arxiv.org
This explainer document aims to provide an overview of the current state of the rapidly
expanding work on synthetic data technologies, with a particular focus on privacy. The …

Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models

G Stein, J Cresswell, R Hosseinzadeh… - Advances in …, 2023 - proceedings.neurips.cc
We systematically study a wide variety of generative models spanning semantically-diverse
image datasets to understand and improve the feature extractors and metrics used to …

Generalization—a key challenge for responsible AI in patient-facing clinical applications

L Goetz, N Seedat, R Vandersluis… - npj Digital …, 2024 - nature.com
Generalization–the ability of AI systems to apply and/or extrapolate their knowledge to new
data which might differ from the original training data–is a major challenge for the effective …

Synthetic data, real errors: how (not) to publish and use synthetic data

B Van Breugel, Z Qian… - … on Machine Learning, 2023 - proceedings.mlr.press
Generating synthetic data through generative models is gaining interest in the ML
community and beyond, promising a future where datasets can be tailored to individual …

Goggle: Generative modelling for tabular data by learning relational structure

T Liu, Z Qian, J Berrevoets… - … Conference on Learning …, 2023 - openreview.net
Deep generative models learn highly complex and non-linear representations to generate
realistic synthetic data. While they have achieved notable success in computer vision and …

Reimagining synthetic tabular data generation through data-centric AI: A comprehensive benchmark

L Hansen, N Seedat… - Advances in neural …, 2023 - proceedings.neurips.cc
Synthetic data serves as an alternative in training machine learning models, particularly
when real-world data is limited or inaccessible. However, ensuring that synthetic data …

Can you rely on your model evaluation? improving model evaluation with synthetic test data

B van Breugel, N Seedat, F Imrie… - Advances in Neural …, 2023 - proceedings.neurips.cc
Evaluating the performance of machine learning models on diverse and underrepresented
subgroups is essential for ensuring fairness and reliability in real-world applications …