A review of the gumbel-max trick and its extensions for discrete stochasticity in machine learning

IAM Huijben, W Kool, MB Paulus… - IEEE transactions on …, 2022‏ - ieeexplore.ieee.org
The Gumbel-max trick is a method to draw a sample from a categorical distribution, given by
its unnormalized (log-) probabilities. Over the past years, the machine learning community …

[HTML][HTML] Data science applications to string theory

F Ruehle - Physics Reports, 2020‏ - Elsevier
We first introduce various algorithms and techniques for machine learning and data science.
While there is a strong focus on neural network applications in unsupervised, supervised …

wav2vec 2.0: A framework for self-supervised learning of speech representations

A Baevski, Y Zhou, A Mohamed… - Advances in neural …, 2020‏ - proceedings.neurips.cc
We show for the first time that learning powerful representations from speech audio alone
followed by fine-tuning on transcribed speech can outperform the best semi-supervised …

Learning graph structures with transformer for multivariate time-series anomaly detection in IoT

Z Chen, D Chen, X Zhang, Z Yuan… - IEEE Internet of Things …, 2021‏ - ieeexplore.ieee.org
Many real-world Internet of Things (IoT) systems, which include a variety of Internet-
connected sensory devices, produce substantial amounts of multivariate time-series data …

Argmax flows and multinomial diffusion: Learning categorical distributions

E Hoogeboom, D Nielsen, P Jaini… - Advances in Neural …, 2021‏ - proceedings.neurips.cc
Generative flows and diffusion models have been predominantly trained on ordinal data, for
example natural images. This paper introduces two extensions of flows and diffusion for …

Regularized vector quantization for tokenized image synthesis

J Zhang, F Zhan, C Theobalt… - Proceedings of the IEEE …, 2023‏ - openaccess.thecvf.com
Quantizing images into discrete representations has been a fundamental problem in unified
generative modeling. Predominant approaches learn the discrete representation either in a …

Searching for a robust neural architecture in four gpu hours

X Dong, Y Yang - Proceedings of the IEEE/CVF conference …, 2019‏ - openaccess.thecvf.com
Conventional neural architecture search (NAS) approaches are usually based on
reinforcement learning or evolutionary strategy, which take more than 1000 GPU hours to …

Chasing sparsity in vision transformers: An end-to-end exploration

T Chen, Y Cheng, Z Gan, L Yuan… - Advances in Neural …, 2021‏ - proceedings.neurips.cc
Vision transformers (ViTs) have recently received explosive popularity, but their enormous
model sizes and training costs remain daunting. Conventional post-training pruning often …

Deep graph reprogramming

Y **g, C Yuan, L Ju, Y Yang… - Proceedings of the …, 2023‏ - openaccess.thecvf.com
In this paper, we explore a novel model reusing task tailored for graph neural networks
(GNNs), termed as" deep graph reprogramming". We strive to reprogram a pre-trained GNN …

Learning to explain: An information-theoretic perspective on model interpretation

J Chen, L Song, M Wainwright… - … conference on machine …, 2018‏ - proceedings.mlr.press
We introduce instancewise feature selection as a methodology for model interpretation. Our
method is based on learning a function to extract a subset of features that are most …