Tabddpm: Modelling tabular data with diffusion models

A Kotelnikov, D Baranchuk… - International …, 2023 - proceedings.mlr.press
Denoising diffusion probabilistic models are becoming the leading generative modeling
paradigm for many important data modalities. Being the most prevalent in the computer …

TabMT: Generating tabular data with masked transformers

M Gulati, P Roysdon - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Abstract Autoregressive and Masked Transformers are incredibly effective as generative
models and classifiers. While these models are most prevalent in NLP, they also exhibit …

Conditional Generative Models for Synthetic Tabular Data: Applications for Precision Medicine and Diverse Representations

K Liu, RB Altman - Annual Review of Biomedical Data Science, 2025 - annualreviews.org
Tabular medical datasets, like electronic health records (EHRs), biobanks, and structured
clinical trial data, are rich sources of information with the potential to advance precision …

Realtabformer: Generating realistic relational and tabular data using transformers

AV Solatorio, O Dupriez - arxiv preprint arxiv:2302.02041, 2023 - arxiv.org
Tabular data is a common form of organizing data. Multiple models are available to generate
synthetic tabular datasets where observations are independent, but few have the ability to …

Generating and imputing tabular data via diffusion and flow-based gradient-boosted trees

A Jolicoeur-Martineau, K Fatras… - International …, 2024 - proceedings.mlr.press
Tabular data is hard to acquire and is subject to missing values. This paper introduces a
novel approach for generating and imputing mixed-type (continuous and categorical) tabular …

Preserving privacy in healthcare: A systematic review of deep learning approaches for synthetic data generation

Y Liu, UR Acharya, JH Tan - Computer Methods and Programs in …, 2024 - Elsevier
Background: Data sharing in healthcare is vital for advancing research and personalized
medicine. However, the process is hindered by privacy, ethical, and legal challenges …

A novel machine learning framework for efficient calibration of complex dem model: A case study of a conglomerate sample

J Shentu, B Lin - Engineering Fracture Mechanics, 2023 - Elsevier
The conglomerate reservoirs in the Mahu Sag of the Junggar Basin, northeastern China are
featured high heterogeneity and complicated lithology. The reservoirs have experienced a …

Mimicking clinical trials with synthetic acute myeloid leukemia patients using generative artificial intelligence

JN Eckardt, W Hahn, C Röllig, S Stasik… - NPJ digital …, 2024 - nature.com
Clinical research relies on high-quality patient data, however, obtaining big data sets is
costly and access to existing data is often hindered by privacy and regulatory concerns …

A universal metric for robust evaluation of synthetic tabular data

VS Chundawat, AK Tarun, M Mandal… - IEEE Transactions …, 2022 - ieeexplore.ieee.org
Synthetic tabular data generation becomes crucial when real data are limited, expensive to
collect, or simply cannot be used due to privacy concerns. However, producing good quality …

EPIC: Effective Prompting for Imbalanced-Class Data Synthesis in Tabular Data Classification via Large Language Models

J Kim, T Kim, J Choo - Advances in Neural Information …, 2025 - proceedings.neurips.cc
Large language models (LLMs) have demonstrated remarkable in-context learning
capabilities across diverse applications. In this work, we explore the effectiveness of LLMs …