Open graph benchmark: Datasets for machine learning on graphs
Abstract We present the Open Graph Benchmark (OGB), a diverse set of challenging and
realistic benchmark datasets to facilitate scalable, robust, and reproducible graph machine …
realistic benchmark datasets to facilitate scalable, robust, and reproducible graph machine …
Tabpfn: A transformer that solves small tabular classification problems in a second
We present TabPFN, a trained Transformer that can do supervised classification for small
tabular datasets in less than a second, needs no hyperparameter tuning and is competitive …
tabular datasets in less than a second, needs no hyperparameter tuning and is competitive …
Confident learning: Estimating uncertainty in dataset labels
Learning exists in the context of data, yet notions of confidence typically focus on model
predictions, not label quality. Confident learning (CL) is an alternative approach which …
predictions, not label quality. Confident learning (CL) is an alternative approach which …
Auto-sklearn 2.0: Hands-free automl via meta-learning
Automated Machine Learning (AutoML) supports practitioners and researchers with the
tedious task of designing machine learning pipelines and has recently achieved substantial …
tedious task of designing machine learning pipelines and has recently achieved substantial …
Well-tuned simple nets excel on tabular datasets
Tabular datasets are the last" unconquered castle" for deep learning, with traditional ML
methods like Gradient-Boosted Decision Trees still performing strongly even against recent …
methods like Gradient-Boosted Decision Trees still performing strongly even against recent …
Auto-pytorch: Multi-fidelity metalearning for efficient and robust autodl
While early AutoML frameworks focused on optimizing traditional ML pipelines and their
hyperparameters, a recent trend in AutoML is to focus on neural architecture search. In this …
hyperparameters, a recent trend in AutoML is to focus on neural architecture search. In this …
Sampling weights of deep neural networks
We introduce a probability distribution, combined with an efficient sampling algorithm, for
weights and biases of fully-connected neural networks. In a supervised learning context, no …
weights and biases of fully-connected neural networks. In a supervised learning context, no …
Data-oob: Out-of-bag estimate as a simple and efficient data value
Data valuation is a powerful framework for providing statistical insights into which data are
beneficial or detrimental to model training. Many Shapley-based data valuation methods …
beneficial or detrimental to model training. Many Shapley-based data valuation methods …
shapiq: Shapley interactions for machine learning
Originally rooted in game theory, the Shapley Value (SV) has recently become an important
tool in machine learning research. Perhaps most notably, it is used for feature attribution and …
tool in machine learning research. Perhaps most notably, it is used for feature attribution and …
Large language models for automated data science: Introducing caafe for context-aware automated feature engineering
As the field of automated machine learning (AutoML) advances, it becomes increasingly
important to incorporate domain knowledge into these systems. We present an approach for …
important to incorporate domain knowledge into these systems. We present an approach for …