Tabular data augmentation for machine learning: Progress and prospects of embracing generative ai
Machine learning (ML) on tabular data is ubiquitous, yet obtaining abundant high-quality
tabular data for model training remains a significant obstacle. Numerous works have …
tabular data for model training remains a significant obstacle. Numerous works have …
Unsupervised generative feature transformation via graph contrastive pre-training and multi-objective fine-tuning
Feature transformation is to derive a new feature set from original features to augment the AI
power of data. In many science domains such as material performance screening, while …
power of data. In many science domains such as material performance screening, while …
Towards Data-Centric AI: A Comprehensive Survey of Traditional, Reinforcement, and Generative Approaches for Tabular Data Transformation
Tabular data is one of the most widely used formats across industries, driving critical
applications in areas such as finance, healthcare, and marketing. In the era of data-centric …
applications in areas such as finance, healthcare, and marketing. In the era of data-centric …
Reinforcement Feature Transformation for Polymer Property Performance Prediction
Polymer property performance prediction aims to forecast specific features or attributes of
polymers, which has become an efficient ap-proach to measuring their performance …
polymers, which has become an efficient ap-proach to measuring their performance …
Resolving the imbalance issue in hierarchical disciplinary topic inference via llm-based data augmentation
In addressing the imbalanced issue of data within the realm of Natural Language
Processing, text data augmentation methods have emerged as pivotal solutions. This data …
Processing, text data augmentation methods have emerged as pivotal solutions. This data …
FedGCS: A Generative Framework for Efficient Client Selection in Federated Learning via Gradient-based Optimization
Federated Learning faces significant challenges in statistical and system heterogeneity,
along with high energy consumption, necessitating efficient client selection strategies …
along with high energy consumption, necessitating efficient client selection strategies …
Neuro-Symbolic Embedding for Short and Effective Feature Selection via Autoregressive Generation
Feature selection aims to identify the optimal feature subset for enhancing downstream
models. Effective feature selection can remove redundant features, save computational …
models. Effective feature selection can remove redundant features, save computational …
Tabular Data-centric AI: Challenges, Techniques and Future Perspectives
Tabular data are the most widely used data formats in almost every application domain,
such as, biology, ecology, and material science. The purpose of tabular data-centric AI is to …
such as, biology, ecology, and material science. The purpose of tabular data-centric AI is to …
A Comprehensive Survey on Data Augmentation
Data augmentation is a series of techniques that generate high-quality artificial data by
manipulating existing data samples. By leveraging data augmentation techniques, AI …
manipulating existing data samples. By leveraging data augmentation techniques, AI …
Evolutionary Large Language Model for Automated Feature Transformation
Feature transformation aims to reconstruct the feature space of raw features to enhance the
performance of downstream models. However, the exponential growth in the combinations …
performance of downstream models. However, the exponential growth in the combinations …