Tabular data augmentation for machine learning: Progress and prospects of embracing generative ai

L Cui, H Li, K Chen, L Shou, G Chen - arxiv preprint arxiv:2407.21523, 2024 - arxiv.org
Machine learning (ML) on tabular data is ubiquitous, yet obtaining abundant high-quality
tabular data for model training remains a significant obstacle. Numerous works have …

Unsupervised generative feature transformation via graph contrastive pre-training and multi-objective fine-tuning

W Ying, D Wang, X Hu, Y Zhou, CC Aggarwal… - Proceedings of the 30th …, 2024 - dl.acm.org
Feature transformation is to derive a new feature set from original features to augment the AI
power of data. In many science domains such as material performance screening, while …

Towards Data-Centric AI: A Comprehensive Survey of Traditional, Reinforcement, and Generative Approaches for Tabular Data Transformation

D Wang, Y Huang, W Ying, H Bai, N Gong… - arxiv preprint arxiv …, 2025 - arxiv.org
Tabular data is one of the most widely used formats across industries, driving critical
applications in areas such as finance, healthcare, and marketing. In the era of data-centric …

Reinforcement Feature Transformation for Polymer Property Performance Prediction

X Hu, D Wang, W Ying, Y Fu - … of the 33rd ACM International Conference …, 2024 - dl.acm.org
Polymer property performance prediction aims to forecast specific features or attributes of
polymers, which has become an efficient ap-proach to measuring their performance …

Resolving the imbalance issue in hierarchical disciplinary topic inference via llm-based data augmentation

X Cai, M **ao, Z Ning, Y Zhou - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
In addressing the imbalanced issue of data within the realm of Natural Language
Processing, text data augmentation methods have emerged as pivotal solutions. This data …

FedGCS: A Generative Framework for Efficient Client Selection in Federated Learning via Gradient-based Optimization

Z Ning, C Tian, M **ao, W Fan, P Wang, L Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Federated Learning faces significant challenges in statistical and system heterogeneity,
along with high energy consumption, necessitating efficient client selection strategies …

Neuro-Symbolic Embedding for Short and Effective Feature Selection via Autoregressive Generation

N Gong, W Ying, D Wang, Y Fu - arxiv preprint arxiv:2404.17157, 2024 - arxiv.org
Feature selection aims to identify the optimal feature subset for enhancing downstream
models. Effective feature selection can remove redundant features, save computational …

Tabular Data-centric AI: Challenges, Techniques and Future Perspectives

Y Fu, D Wang, H **ong, K Liu - Proceedings of the 33rd ACM …, 2024 - dl.acm.org
Tabular data are the most widely used data formats in almost every application domain,
such as, biology, ecology, and material science. The purpose of tabular data-centric AI is to …

A Comprehensive Survey on Data Augmentation

Z Wang, P Wang, K Liu, P Wang, Y Fu, CT Lu… - arxiv preprint arxiv …, 2024 - arxiv.org
Data augmentation is a series of techniques that generate high-quality artificial data by
manipulating existing data samples. By leveraging data augmentation techniques, AI …

Evolutionary Large Language Model for Automated Feature Transformation

N Gong, CK Reddy, W Ying, Y Fu - arxiv preprint arxiv:2405.16203, 2024 - arxiv.org
Feature transformation aims to reconstruct the feature space of raw features to enhance the
performance of downstream models. However, the exponential growth in the combinations …