On neural differential equations

P Kidger - ar** learning
algorithms that infer independent and symmetric entities from the perceptual input. This often …

On training implicit models

Z Geng, XY Zhang, S Bai, Y Wang… - Advances in Neural …, 2021 - proceedings.neurips.cc
This paper focuses on training implicit models of infinite layers. Specifically, previous works
employ implicit differentiation and solve the exact gradient for the backward propagation …

Deep equilibrium approaches to diffusion models

A Pokle, Z Geng, JZ Kolter - Advances in Neural …, 2022 - proceedings.neurips.cc
Diffusion-based generative models are extremely effective in generating high-quality
images, with generated samples often surpassing the quality of those produced by other …

Recurrence without recurrence: Stable video landmark detection with deep equilibrium models

P Micaelli, A Vahdat, H Yin, J Kautz… - Proceedings of the …, 2023 - openaccess.thecvf.com
Cascaded computation, whereby predictions are recurrently refined over several stages, has
been a persistent theme throughout the development of landmark detection models. In this …

Path independent equilibrium models can better exploit test-time computation

C Anil, A Pokle, K Liang, J Treutlein… - Advances in …, 2022 - proceedings.neurips.cc
Designing networks capable of attaining better performance with an increased inference
budget is important to facilitate generalization to harder problem instances. Recent efforts …

Exploiting connections between Lipschitz structures for certifiably robust deep equilibrium models

A Havens, A Araujo, S Garg… - Advances in Neural …, 2023 - proceedings.neurips.cc
Recently, deep equilibrium models (DEQs) have drawn increasing attention from the
machine learning community. However, DEQs are much less understood in terms of certified …

Looped transformers are better at learning learning algorithms

L Yang, K Lee, R Nowak, D Papailiopoulos - arxiv preprint arxiv …, 2023 - arxiv.org
Transformers have demonstrated effectiveness in in-context solving data-fitting problems
from various (latent) models, as reported by Garg et al. However, the absence of an inherent …