Deep neural networks and tabular data: A survey

V Borisov, T Leemann, K Seßler, J Haug… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Heterogeneous tabular data are the most commonly used form of data and are essential for
numerous critical and computationally demanding applications. On homogeneous datasets …

Generative ai and process systems engineering: The next frontier

B Decardi-Nelson, AS Alshehri, A Ajagekar… - Computers & Chemical …, 2024 - Elsevier
This review article explores how emerging generative artificial intelligence (GenAI) models,
such as large language models (LLMs), can enhance solution methodologies within process …

[HTML][HTML] A benchmark for data imputation methods

S Jäger, A Allhorn, F Bießmann - Frontiers in big Data, 2021 - frontiersin.org
With the increasing importance and complexity of data pipelines, data quality became one of
the key challenges in modern software applications. The importance of data quality has …

Hyperimpute: Generalized iterative imputation with automatic model selection

D Jarrett, BC Cebere, T Liu, A Curth… - International …, 2022 - proceedings.mlr.press
Consider the problem of imputing missing values in a dataset. One the one hand,
conventional approaches using iterative imputation benefit from the simplicity and …

Learning to maximize mutual information for dynamic feature selection

IC Covert, W Qiu, M Lu, NY Kim… - International …, 2023 - proceedings.mlr.press
Feature selection helps reduce data acquisition costs in ML, but the standard approach is to
train models with static feature subsets. Here, we consider the dynamic feature selection …

Generative table pre-training empowers models for tabular prediction

T Zhang, S Wang, S Yan, J Li, Q Liu - arxiv preprint arxiv:2305.09696, 2023 - arxiv.org
Recently, the topic of table pre-training has attracted considerable research interest.
However, how to employ table pre-training to boost the performance of tabular prediction …

Identifiable generative models for missing not at random data imputation

C Ma, C Zhang - Advances in Neural Information …, 2021 - proceedings.neurips.cc
Real-world datasets often have missing values associated with complex generative
processes, where the cause of the missingness may not be fully observed. This is known as …

Missdiff: Training diffusion models on tabular data with missing values

Y Ouyang, L **e, C Li, G Cheng - arxiv preprint arxiv:2307.00467, 2023 - arxiv.org
The diffusion model has shown remarkable performance in modeling data distributions and
synthesizing data. However, the vanilla diffusion model requires complete or fully observed …

Deep embedded clustering generalisability and adaptation for integrating mixed datatypes: two critical care cohorts

JWTM de Kok, F van Rosmalen, J Koeze, F Keus… - Scientific Reports, 2024 - nature.com
Abstract We validated a Deep Embedded Clustering (DEC) model and its adaptation for
integrating mixed datatypes (in this study, numerical and categorical variables). Deep …

Balanced mixed-type tabular data synthesis with diffusion models

Z Yang, H Yu, P Guo, K Zanna, X Yang… - arxiv preprint arxiv …, 2024 - arxiv.org
Diffusion models have emerged as a robust framework for various generative tasks,
including tabular data synthesis. However, current tabular diffusion models tend to inherit …