Flashattention: Fast and memory-efficient exact attention with io-awareness

T Dao, D Fu, S Ermon, A Rudra… - Advances in Neural …, 2022 - proceedings.neurips.cc
Transformers are slow and memory-hungry on long sequences, since the time and memory
complexity of self-attention are quadratic in sequence length. Approximate attention …

OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization

G Ahdritz, N Bouatta, C Floristean, S Kadyan, Q **a… - Nature …, 2024 - nature.com
AlphaFold2 revolutionized structural biology with the ability to predict protein structures with
exceptionally high accuracy. Its implementation, however, lacks the code and data required …

Diagonal state spaces are as effective as structured state spaces

A Gupta, A Gu, J Berant - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Modeling long range dependencies in sequential data is a fundamental step towards
attaining human-level performance in many modalities such as text, vision, audio and video …

Interpolating between optimal transport and mmd using sinkhorn divergences

J Feydy, T Séjourné, FX Vialard… - The 22nd …, 2019 - proceedings.mlr.press
Comparing probability distributions is a fundamental problem in data sciences. Simple
norms and divergences such as the total variation and the relative entropy only compare …

Pot: Python optimal transport

R Flamary, N Courty, A Gramfort, MZ Alaya… - Journal of Machine …, 2021 - jmlr.org
Optimal transport has recently been reintroduced to the machine learning community thanks
in part to novel efficient optimization procedures allowing for medium to large scale …

Fast end-to-end learning on protein surfaces

F Sverrisson, J Feydy, BE Correia… - Proceedings of the …, 2021 - openaccess.thecvf.com
Proteins' biological functions are defined by the geometric and chemical structure of their 3D
molecular surfaces. Recent works have shown that geometric deep learning can be used on …

Liquid structural state-space models

R Hasani, M Lechner, TH Wang, M Chahine… - arxiv preprint arxiv …, 2022 - arxiv.org
A proper parametrization of state transition matrices of linear state-space models (SSMs)
followed by standard nonlinearities enables them to efficiently learn representations from …

Differentiable graph module (dgm) for graph convolutional networks

A Kazi, L Cosmo, SA Ahmadi, N Navab… - … on Pattern Analysis …, 2022 - ieeexplore.ieee.org
Graph deep learning has recently emerged as a powerful ML concept allowing to generalize
successful deep neural architectures to non-euclidean structured data. Such methods have …

Modern applications of machine learning in quantum sciences

A Dawid, J Arnold, B Requena, A Gresch… - arxiv preprint arxiv …, 2022 - arxiv.org
In these Lecture Notes, we provide a comprehensive introduction to the most recent
advances in the application of machine learning methods in quantum sciences. We cover …

Sinkformers: Transformers with doubly stochastic attention

ME Sander, P Ablin, M Blondel… - … Conference on Artificial …, 2022 - proceedings.mlr.press
Attention based models such as Transformers involve pairwise interactions between data
points, modeled with a learnable attention matrix. Importantly, this attention matrix is …