- Academic Search

IAM Huijben, W Kool, MB Paulus… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

The Gumbel-max trick is a method to draw a sample from a categorical distribution, given by
its unnormalized (log-) probabilities. Over the past years, the machine learning community …

Save Cite Cited by 107 Related articles All 9 versions Free GPT-4

[Free GPT-4]

[PDF] neurips.cc

wav2vec 2.0: A framework for self-supervised learning of speech representations

A Baevski, Y Zhou, A Mohamed… - Advances in neural …, 2020 - proceedings.neurips.cc

We show for the first time that learning powerful representations from speech audio alone
followed by fine-tuning on transcribed speech can outperform the best semi-supervised …

Save Cite Cited by 6325 Related articles All 11 versions Free GPT-4 View as HTML

[Free GPT-4]

[HTML] sciencedirect.com

[HTML][HTML] Data science applications to string theory

F Ruehle - Physics Reports, 2020 - Elsevier

We first introduce various algorithms and techniques for machine learning and data science.
While there is a strong focus on neural network applications in unsupervised, supervised …

Save Cite Cited by 174 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] openreview.net

Categorical reparameterization with gumbel-softmax

E Jang, S Gu, B Poole - arxiv preprint arxiv:1611.01144, 2016 - arxiv.org

Categorical variables are a natural choice for representing discrete structure in the world.
However, stochastic neural networks rarely use categorical latent variables due to the …

Save Cite Cited by 6824 Related articles All 10 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

The concrete distribution: A continuous relaxation of discrete random variables

CJ Maddison, A Mnih, YW Teh - arxiv preprint arxiv:1611.00712, 2016 - arxiv.org

The reparameterization trick enables optimizing large scale stochastic computation graphs
via gradient descent. The essence of the trick is to refactor each stochastic node into a …

Save Cite Cited by 2969 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Argmax flows and multinomial diffusion: Learning categorical distributions

E Hoogeboom, D Nielsen, P Jaini… - Advances in Neural …, 2021 - proceedings.neurips.cc

Generative flows and diffusion models have been predominantly trained on ordinal data, for
example natural images. This paper introduces two extensions of flows and diffusion for …

Save Cite Cited by 381 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Learning graph structures with transformer for multivariate time-series anomaly detection in IoT

Z Chen, D Chen, X Zhang, Z Yuan… - IEEE Internet of Things …, 2021 - ieeexplore.ieee.org

Many real-world Internet of Things (IoT) systems, which include a variety of Internet-
connected sensory devices, produce substantial amounts of multivariate time-series data …

Save Cite Cited by 416 Related articles All 5 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Searching for a robust neural architecture in four gpu hours

X Dong, Y Yang - Proceedings of the IEEE/CVF conference …, 2019 - openaccess.thecvf.com

Conventional neural architecture search (NAS) approaches are usually based on
reinforcement learning or evolutionary strategy, which take more than 1000 GPU hours to …

Save Cite Cited by 776 Related articles All 14 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Learning to explain: An information-theoretic perspective on model interpretation

J Chen, L Song, M Wainwright… - … conference on machine …, 2018 - proceedings.mlr.press

We introduce instancewise feature selection as a methodology for model interpretation. Our
method is based on learning a function to extract a subset of features that are most …

Save Cite Cited by 738 Related articles All 10 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Chasing sparsity in vision transformers: An end-to-end exploration

T Chen, Y Cheng, Z Gan, L Yuan… - Advances in Neural …, 2021 - proceedings.neurips.cc

Vision transformers (ViTs) have recently received explosive popularity, but their enormous
model sizes and training costs remain daunting. Conventional post-training pruning often …

Save Cite Cited by 224 Related articles All 8 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

A* sampling

A review of the gumbel-max trick and its extensions for discrete stochasticity in machine learning

wav2vec 2.0: A framework for self-supervised learning of speech representations

[HTML][HTML] Data science applications to string theory

Categorical reparameterization with gumbel-softmax

The concrete distribution: A continuous relaxation of discrete random variables

Argmax flows and multinomial diffusion: Learning categorical distributions

Learning graph structures with transformer for multivariate time-series anomaly detection in IoT

Searching for a robust neural architecture in four gpu hours

Learning to explain: An information-theoretic perspective on model interpretation

Chasing sparsity in vision transformers: An end-to-end exploration