Why do tree-based models still outperform deep learning on typical tabular data?

L Grinsztajn, E Oyallon… - Advances in neural …, 2022 - proceedings.neurips.cc
While deep learning has enabled tremendous progress on text and image datasets, its
superiority on tabular data is not clear. We contribute extensive benchmarks of standard and …

Auto-sklearn 2.0: Hands-free automl via meta-learning

M Feurer, K Eggensperger, S Falkner… - Journal of Machine …, 2022 - jmlr.org
Automated Machine Learning (AutoML) supports practitioners and researchers with the
tedious task of designing machine learning pipelines and has recently achieved substantial …

Tabpfn: A transformer that solves small tabular classification problems in a second

N Hollmann, S Müller, K Eggensperger… - arxiv preprint arxiv …, 2022 - arxiv.org
We present TabPFN, a trained Transformer that can do supervised classification for small
tabular datasets in less than a second, needs no hyperparameter tuning and is competitive …

When do neural nets outperform boosted trees on tabular data?

D McElfresh, S Khandagale… - Advances in …, 2024 - proceedings.neurips.cc
Tabular data is one of the most commonly used types of data in machine learning. Despite
recent advances in neural nets (NNs) for tabular data, there is still an active discussion on …

Subtab: Subsetting features of tabular data for self-supervised representation learning

T Ucar, E Hajiramezanali… - Advances in Neural …, 2021 - proceedings.neurips.cc
Self-supervised learning has been shown to be very effective in learning useful
representations, and yet much of the success is achieved in data types such as images …

Scarf: Self-supervised contrastive learning using random feature corruption

D Bahri, H Jiang, Y Tay, D Metzler - arxiv preprint arxiv:2106.15147, 2021 - arxiv.org
Self-supervised contrastive representation learning has proved incredibly successful in the
vision and natural language domains, enabling state-of-the-art performance with orders of …

Amlb: an automl benchmark

P Gijsbers, MLP Bueno, S Coors, E LeDell… - Journal of Machine …, 2024 - jmlr.org
Comparing different AutoML frameworks is notoriously challenging and often done
incorrectly. We introduce an open and extensible benchmark that follows best practices and …

HPOBench: A collection of reproducible multi-fidelity benchmark problems for HPO

K Eggensperger, P Müller, N Mallik, M Feurer… - arxiv preprint arxiv …, 2021 - arxiv.org
To achieve peak predictive performance, hyperparameter optimization (HPO) is a crucial
component of machine learning and its applications. Over the last years, the number of …

Sampling weights of deep neural networks

EL Bolager, I Burak, C Datar, Q Sun… - Advances in Neural …, 2023 - proceedings.neurips.cc
We introduce a probability distribution, combined with an efficient sampling algorithm, for
weights and biases of fully-connected neural networks. In a supervised learning context, no …

A survey on self-supervised learning for non-sequential tabular data

WY Wang, WW Du, D Xu, W Wang, WC Peng - Machine Learning, 2025 - Springer
Self-supervised learning (SSL) has been incorporated into many state-of-the-art models in
various domains, where SSL defines pretext tasks based on unlabeled datasets to learn …