Reimagining synthetic tabular data generation through data-centric AI: A comprehensive benchmark

L Hansen, N Seedat… - Advances in neural …, 2023 - proceedings.neurips.cc
Synthetic data serves as an alternative in training machine learning models, particularly
when real-world data is limited or inaccessible. However, ensuring that synthetic data …

Graph anomaly detection with few labels: A data-centric approach

X Ma, R Li, F Liu, K Ding, J Yang, J Wu - Proceedings of the 30th ACM …, 2024 - dl.acm.org
Anomalous node detection in a static graph faces significant challenges due to the rarity of
anomalies and the substantial cost of labeling their deviant structure and attribute patterns …

Curated LLM: Synergy of LLMs and data curation for tabular augmentation in low-data regimes

N Seedat, N Huynh, B Van Breugel… - arxiv preprint arxiv …, 2023 - arxiv.org
Machine Learning (ML) in low-data settings remains an underappreciated yet crucial
problem. Hence, data augmentation methods to increase the sample size of datasets …

Matchmaker: Self-Improving Large Language Model Programs for Schema Matching

N Seedat, M van der Schaar - arxiv preprint arxiv:2410.24105, 2024 - arxiv.org
Schema matching--the task of finding matches between attributes across disparate data
sources with different tables and hierarchies--is critical for creating interoperable machine …

Towards Human-Guided, Data-Centric LLM Co-Pilots

E Saveliev, J Liu, N Seedat, A Boyd… - arxiv preprint arxiv …, 2025 - arxiv.org
Machine learning (ML) has the potential to revolutionize various domains, but its adoption is
often hindered by the disconnect between the needs of domain experts and translating …

A Comparative Study of Bug Triage Representation and Classification Approaches from Canonical to Large Language Models

FT Da Silva, FR De Araújo… - 2024 5th International …, 2024 - ieeexplore.ieee.org
Bug triage is the task of assigning newly reported bugs to the proper developers or team for
resolution. This is a critical point in software maintenance as it directly influences the time …

Matchmaker: Self-Improving Compositional LLM Programs for Table Schema Matching

N Seedat, M van der Schaar - NeurIPS 2024 Third Table Representation … - openreview.net
Schema matching--the task of finding matches between attributes across disparate data
sources with different tables and hierarchies--is critical for creating interoperable machine …