A survey on model-based reinforcement learning

FM Luo, T Xu, H Lai, XH Chen, W Zhang… - Science China Information …, 2024 - Springer
Reinforcement learning (RL) interacts with the environment to solve sequential decision-
making problems via a trial-and-error approach. Errors are always undesirable in real-world …

A review of the gumbel-max trick and its extensions for discrete stochasticity in machine learning

IAM Huijben, W Kool, MB Paulus… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
The Gumbel-max trick is a method to draw a sample from a categorical distribution, given by
its unnormalized (log-) probabilities. Over the past years, the machine learning community …

Training diffusion models with reinforcement learning

K Black, M Janner, Y Du, I Kostrikov… - arxiv preprint arxiv …, 2023 - arxiv.org
Diffusion models are a class of flexible generative models trained with an approximation to
the log-likelihood objective. However, most use cases of diffusion models are not concerned …

Recurrent neural network wave functions

M Hibat-Allah, M Ganahl, LE Hayward, RG Melko… - Physical Review …, 2020 - APS
A core technology that has emerged from the artificial intelligence revolution is the recurrent
neural network (RNN). Its unique sequence-based architecture provides a tractable …

Learning generalisable omni-scale representations for person re-identification

K Zhou, Y Yang, A Cavallaro… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
An effective person re-identification (re-ID) model should learn feature representations that
are both discriminative, for distinguishing similar-looking people, and generalisable, for …

Global optimality guarantees for policy gradient methods

J Bhandari, D Russo - Operations Research, 2024 - pubsonline.informs.org
Policy gradients methods apply to complex, poorly understood, control problems by
performing stochastic gradient descent over a parameterized class of polices. Unfortunately …

Optimal experimental design: Formulations and computations

X Huan, J Jagalur, Y Marzouk - Acta Numerica, 2024 - cambridge.org
Questions of 'how best to acquire data'are essential to modelling and prediction in the
natural and social sciences, engineering applications, and beyond. Optimal experimental …

Differentiable automatic data augmentation

Y Li, G Hu, Y Wang, T Hospedales… - Computer Vision–ECCV …, 2020 - Springer
Data augmentation (DA) techniques aim to increase data variability, and thus train deep
networks with better generalisation. The pioneering AutoAugment automated the search for …

Differentiable quantum architecture search

SX Zhang, CY Hsieh, S Zhang… - Quantum Science and …, 2022 - iopscience.iop.org
Quantum architecture search (QAS) is the process of automating architecture engineering of
quantum circuits. It has been desired to construct a powerful and general QAS platform …

Tighter risk certificates for neural networks

M Pérez-Ortiz, O Rivasplata, J Shawe-Taylor… - Journal of Machine …, 2021 - jmlr.org
This paper presents an empirical study regarding training probabilistic neural networks
using training objectives derived from PAC-Bayes bounds. In the context of probabilistic …