Large Language Models (LLMs) on Tabular Data: Prediction, Generation, and Understanding--A Survey

X Fang, W Xu, FA Tan, J Zhang, Z Hu, Y Qi… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent breakthroughs in large language modeling have facilitated rigorous exploration of
their application in diverse tasks related to tabular data modeling, such as prediction, tabular …

A comprehensive survey on data augmentation

Z Wang, P Wang, K Liu, P Wang, Y Fu, CT Lu… - arxiv preprint arxiv …, 2024 - arxiv.org
Data augmentation is a series of techniques that generate high-quality artificial data by
manipulating existing data samples. By leveraging data augmentation techniques, AI …

Mixed-type tabular data synthesis with score-based diffusion in latent space

H Zhang, J Zhang, B Srinivasan, Z Shen, X Qin… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent advances in tabular data generation have greatly enhanced synthetic data quality.
However, extending diffusion models to tabular data is challenging due to the intricately …

Synthcity: a benchmark framework for diverse use cases of tabular synthetic data

Z Qian, R Davis… - Advances in neural …, 2023 - proceedings.neurips.cc
Accessible high-quality data is the bread and butter of machine learning research, and the
demand for data has exploded as larger and more advanced ML models are built across …

Causal deep learning

J Berrevoets, K Kacprzyk, Z Qian… - arxiv preprint arxiv …, 2023 - arxiv.org
Causality has the potential to truly transform the way we solve a large number of real-world
problems. Yet, so far, its potential largely remains to be unlocked as causality often requires …

Clavaddpm: Multi-relational data synthesis with cluster-guided diffusion models

W Pang, M Shafieinejad, L Liu… - Advances in Neural …, 2025 - proceedings.neurips.cc
Recent research in tabular data synthesis has focused on single tables, whereas real-world
applications often involve complex data with tens or hundreds of interconnected tables …

Creating Artificial Students that Never Existed: Leveraging Large Language Models and CTGANs for Synthetic Data Generation

M Khalil, F Vadiee, R Shakya, Q Liu - Proceedings of the 15th …, 2025 - dl.acm.org
In this study, we explore the growing potential of AI and deep learning technologies,
particularly Generative Adversarial Networks (GANs) and Large Language Models (LLMs) …

Survivalgan: Generating time-to-event data for survival analysis

A Norcliffe, B Cebere, F Imrie, P Lio… - International …, 2023 - proceedings.mlr.press
Synthetic data is becoming an increasingly promising technology, and successful
applications can improve privacy, fairness, and data democratization. While there are many …

How realistic is your synthetic data? constraining deep generative models for tabular data

MC Stoian, S Dyrmishi, M Cordy, T Lukasiewicz… - arxiv preprint arxiv …, 2024 - arxiv.org
Deep Generative Models (DGMs) have been shown to be powerful tools for generating
tabular data, as they have been increasingly able to capture the complex distributions that …

Causal deep learning: encouraging impact on real-world problems through causality

J Berrevoets, K Kacprzyk, Z Qian… - … and Trends® in …, 2024 - nowpublishers.com
Causality has the potential to truly transform the way we solve a large number of real-world
problems. Yet, so far, its potential largely remains to be unlocked as causality often requires …