Deep neural networks and tabular data: A survey
Heterogeneous tabular data are the most commonly used form of data and are essential for
numerous critical and computationally demanding applications. On homogeneous datasets …
numerous critical and computationally demanding applications. On homogeneous datasets …
Generative ai and process systems engineering: The next frontier
This review article explores how emerging generative artificial intelligence (GenAI) models,
such as large language models (LLMs), can enhance solution methodologies within process …
such as large language models (LLMs), can enhance solution methodologies within process …
[HTML][HTML] A benchmark for data imputation methods
With the increasing importance and complexity of data pipelines, data quality became one of
the key challenges in modern software applications. The importance of data quality has …
the key challenges in modern software applications. The importance of data quality has …
Hyperimpute: Generalized iterative imputation with automatic model selection
Consider the problem of imputing missing values in a dataset. One the one hand,
conventional approaches using iterative imputation benefit from the simplicity and …
conventional approaches using iterative imputation benefit from the simplicity and …
Learning to maximize mutual information for dynamic feature selection
Feature selection helps reduce data acquisition costs in ML, but the standard approach is to
train models with static feature subsets. Here, we consider the dynamic feature selection …
train models with static feature subsets. Here, we consider the dynamic feature selection …
Generative table pre-training empowers models for tabular prediction
Recently, the topic of table pre-training has attracted considerable research interest.
However, how to employ table pre-training to boost the performance of tabular prediction …
However, how to employ table pre-training to boost the performance of tabular prediction …
Identifiable generative models for missing not at random data imputation
Real-world datasets often have missing values associated with complex generative
processes, where the cause of the missingness may not be fully observed. This is known as …
processes, where the cause of the missingness may not be fully observed. This is known as …
Missdiff: Training diffusion models on tabular data with missing values
The diffusion model has shown remarkable performance in modeling data distributions and
synthesizing data. However, the vanilla diffusion model requires complete or fully observed …
synthesizing data. However, the vanilla diffusion model requires complete or fully observed …
Deep embedded clustering generalisability and adaptation for integrating mixed datatypes: two critical care cohorts
Abstract We validated a Deep Embedded Clustering (DEC) model and its adaptation for
integrating mixed datatypes (in this study, numerical and categorical variables). Deep …
integrating mixed datatypes (in this study, numerical and categorical variables). Deep …
Balanced mixed-type tabular data synthesis with diffusion models
Diffusion models have emerged as a robust framework for various generative tasks,
including tabular data synthesis. However, current tabular diffusion models tend to inherit …
including tabular data synthesis. However, current tabular diffusion models tend to inherit …