A review of the gumbel-max trick and its extensions for discrete stochasticity in machine learning
IAM Huijben, W Kool, MB Paulus… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
The Gumbel-max trick is a method to draw a sample from a categorical distribution, given by
its unnormalized (log-) probabilities. Over the past years, the machine learning community …
its unnormalized (log-) probabilities. Over the past years, the machine learning community …
wav2vec 2.0: A framework for self-supervised learning of speech representations
We show for the first time that learning powerful representations from speech audio alone
followed by fine-tuning on transcribed speech can outperform the best semi-supervised …
followed by fine-tuning on transcribed speech can outperform the best semi-supervised …
[HTML][HTML] Data science applications to string theory
F Ruehle - Physics Reports, 2020 - Elsevier
We first introduce various algorithms and techniques for machine learning and data science.
While there is a strong focus on neural network applications in unsupervised, supervised …
While there is a strong focus on neural network applications in unsupervised, supervised …
Categorical reparameterization with gumbel-softmax
Categorical variables are a natural choice for representing discrete structure in the world.
However, stochastic neural networks rarely use categorical latent variables due to the …
However, stochastic neural networks rarely use categorical latent variables due to the …
The concrete distribution: A continuous relaxation of discrete random variables
The reparameterization trick enables optimizing large scale stochastic computation graphs
via gradient descent. The essence of the trick is to refactor each stochastic node into a …
via gradient descent. The essence of the trick is to refactor each stochastic node into a …
Argmax flows and multinomial diffusion: Learning categorical distributions
Generative flows and diffusion models have been predominantly trained on ordinal data, for
example natural images. This paper introduces two extensions of flows and diffusion for …
example natural images. This paper introduces two extensions of flows and diffusion for …
Learning graph structures with transformer for multivariate time-series anomaly detection in IoT
Many real-world Internet of Things (IoT) systems, which include a variety of Internet-
connected sensory devices, produce substantial amounts of multivariate time-series data …
connected sensory devices, produce substantial amounts of multivariate time-series data …
Searching for a robust neural architecture in four gpu hours
Conventional neural architecture search (NAS) approaches are usually based on
reinforcement learning or evolutionary strategy, which take more than 1000 GPU hours to …
reinforcement learning or evolutionary strategy, which take more than 1000 GPU hours to …
Learning to explain: An information-theoretic perspective on model interpretation
We introduce instancewise feature selection as a methodology for model interpretation. Our
method is based on learning a function to extract a subset of features that are most …
method is based on learning a function to extract a subset of features that are most …
Chasing sparsity in vision transformers: An end-to-end exploration
Vision transformers (ViTs) have recently received explosive popularity, but their enormous
model sizes and training costs remain daunting. Conventional post-training pruning often …
model sizes and training costs remain daunting. Conventional post-training pruning often …