A review of the gumbel-max trick and its extensions for discrete stochasticity in machine learning

IAM Huijben, W Kool, MB Paulus… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
The Gumbel-max trick is a method to draw a sample from a categorical distribution, given by
its unnormalized (log-) probabilities. Over the past years, the machine learning community …

wav2vec 2.0: A framework for self-supervised learning of speech representations

A Baevski, Y Zhou, A Mohamed… - Advances in neural …, 2020 - proceedings.neurips.cc
We show for the first time that learning powerful representations from speech audio alone
followed by fine-tuning on transcribed speech can outperform the best semi-supervised …

[HTML][HTML] Data science applications to string theory

F Ruehle - Physics Reports, 2020 - Elsevier
We first introduce various algorithms and techniques for machine learning and data science.
While there is a strong focus on neural network applications in unsupervised, supervised …

Categorical reparameterization with gumbel-softmax

E Jang, S Gu, B Poole - arxiv preprint arxiv:1611.01144, 2016 - arxiv.org
Categorical variables are a natural choice for representing discrete structure in the world.
However, stochastic neural networks rarely use categorical latent variables due to the …

The concrete distribution: A continuous relaxation of discrete random variables

CJ Maddison, A Mnih, YW Teh - arxiv preprint arxiv:1611.00712, 2016 - arxiv.org
The reparameterization trick enables optimizing large scale stochastic computation graphs
via gradient descent. The essence of the trick is to refactor each stochastic node into a …

Argmax flows and multinomial diffusion: Learning categorical distributions

E Hoogeboom, D Nielsen, P Jaini… - Advances in Neural …, 2021 - proceedings.neurips.cc
Generative flows and diffusion models have been predominantly trained on ordinal data, for
example natural images. This paper introduces two extensions of flows and diffusion for …

Learning graph structures with transformer for multivariate time-series anomaly detection in IoT

Z Chen, D Chen, X Zhang, Z Yuan… - IEEE Internet of Things …, 2021 - ieeexplore.ieee.org
Many real-world Internet of Things (IoT) systems, which include a variety of Internet-
connected sensory devices, produce substantial amounts of multivariate time-series data …

Searching for a robust neural architecture in four gpu hours

X Dong, Y Yang - Proceedings of the IEEE/CVF conference …, 2019 - openaccess.thecvf.com
Conventional neural architecture search (NAS) approaches are usually based on
reinforcement learning or evolutionary strategy, which take more than 1000 GPU hours to …

Learning to explain: An information-theoretic perspective on model interpretation

J Chen, L Song, M Wainwright… - … conference on machine …, 2018 - proceedings.mlr.press
We introduce instancewise feature selection as a methodology for model interpretation. Our
method is based on learning a function to extract a subset of features that are most …

Chasing sparsity in vision transformers: An end-to-end exploration

T Chen, Y Cheng, Z Gan, L Yuan… - Advances in Neural …, 2021 - proceedings.neurips.cc
Vision transformers (ViTs) have recently received explosive popularity, but their enormous
model sizes and training costs remain daunting. Conventional post-training pruning often …