Tabddpm: Modelling tabular data with diffusion models
A Kotelnikov, D Baranchuk… - International …, 2023 - proceedings.mlr.press
Denoising diffusion probabilistic models are becoming the leading generative modeling
paradigm for many important data modalities. Being the most prevalent in the computer …
paradigm for many important data modalities. Being the most prevalent in the computer …
TabMT: Generating tabular data with masked transformers
Abstract Autoregressive and Masked Transformers are incredibly effective as generative
models and classifiers. While these models are most prevalent in NLP, they also exhibit …
models and classifiers. While these models are most prevalent in NLP, they also exhibit …
Conditional Generative Models for Synthetic Tabular Data: Applications for Precision Medicine and Diverse Representations
Tabular medical datasets, like electronic health records (EHRs), biobanks, and structured
clinical trial data, are rich sources of information with the potential to advance precision …
clinical trial data, are rich sources of information with the potential to advance precision …
Realtabformer: Generating realistic relational and tabular data using transformers
AV Solatorio, O Dupriez - arxiv preprint arxiv:2302.02041, 2023 - arxiv.org
Tabular data is a common form of organizing data. Multiple models are available to generate
synthetic tabular datasets where observations are independent, but few have the ability to …
synthetic tabular datasets where observations are independent, but few have the ability to …
Generating and imputing tabular data via diffusion and flow-based gradient-boosted trees
A Jolicoeur-Martineau, K Fatras… - International …, 2024 - proceedings.mlr.press
Tabular data is hard to acquire and is subject to missing values. This paper introduces a
novel approach for generating and imputing mixed-type (continuous and categorical) tabular …
novel approach for generating and imputing mixed-type (continuous and categorical) tabular …
Preserving privacy in healthcare: A systematic review of deep learning approaches for synthetic data generation
Y Liu, UR Acharya, JH Tan - Computer Methods and Programs in …, 2024 - Elsevier
Background: Data sharing in healthcare is vital for advancing research and personalized
medicine. However, the process is hindered by privacy, ethical, and legal challenges …
medicine. However, the process is hindered by privacy, ethical, and legal challenges …
A novel machine learning framework for efficient calibration of complex dem model: A case study of a conglomerate sample
The conglomerate reservoirs in the Mahu Sag of the Junggar Basin, northeastern China are
featured high heterogeneity and complicated lithology. The reservoirs have experienced a …
featured high heterogeneity and complicated lithology. The reservoirs have experienced a …
Mimicking clinical trials with synthetic acute myeloid leukemia patients using generative artificial intelligence
Clinical research relies on high-quality patient data, however, obtaining big data sets is
costly and access to existing data is often hindered by privacy and regulatory concerns …
costly and access to existing data is often hindered by privacy and regulatory concerns …
A universal metric for robust evaluation of synthetic tabular data
Synthetic tabular data generation becomes crucial when real data are limited, expensive to
collect, or simply cannot be used due to privacy concerns. However, producing good quality …
collect, or simply cannot be used due to privacy concerns. However, producing good quality …
EPIC: Effective Prompting for Imbalanced-Class Data Synthesis in Tabular Data Classification via Large Language Models
Large language models (LLMs) have demonstrated remarkable in-context learning
capabilities across diverse applications. In this work, we explore the effectiveness of LLMs …
capabilities across diverse applications. In this work, we explore the effectiveness of LLMs …