Flashattention: Fast and memory-efficient exact attention with io-awareness
Transformers are slow and memory-hungry on long sequences, since the time and memory
complexity of self-attention are quadratic in sequence length. Approximate attention …
complexity of self-attention are quadratic in sequence length. Approximate attention …
OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization
AlphaFold2 revolutionized structural biology with the ability to predict protein structures with
exceptionally high accuracy. Its implementation, however, lacks the code and data required …
exceptionally high accuracy. Its implementation, however, lacks the code and data required …
Diagonal state spaces are as effective as structured state spaces
Modeling long range dependencies in sequential data is a fundamental step towards
attaining human-level performance in many modalities such as text, vision, audio and video …
attaining human-level performance in many modalities such as text, vision, audio and video …
Interpolating between optimal transport and mmd using sinkhorn divergences
Comparing probability distributions is a fundamental problem in data sciences. Simple
norms and divergences such as the total variation and the relative entropy only compare …
norms and divergences such as the total variation and the relative entropy only compare …
Pot: Python optimal transport
Optimal transport has recently been reintroduced to the machine learning community thanks
in part to novel efficient optimization procedures allowing for medium to large scale …
in part to novel efficient optimization procedures allowing for medium to large scale …
Fast end-to-end learning on protein surfaces
Proteins' biological functions are defined by the geometric and chemical structure of their 3D
molecular surfaces. Recent works have shown that geometric deep learning can be used on …
molecular surfaces. Recent works have shown that geometric deep learning can be used on …
Liquid structural state-space models
A proper parametrization of state transition matrices of linear state-space models (SSMs)
followed by standard nonlinearities enables them to efficiently learn representations from …
followed by standard nonlinearities enables them to efficiently learn representations from …
Differentiable graph module (dgm) for graph convolutional networks
Graph deep learning has recently emerged as a powerful ML concept allowing to generalize
successful deep neural architectures to non-euclidean structured data. Such methods have …
successful deep neural architectures to non-euclidean structured data. Such methods have …
Modern applications of machine learning in quantum sciences
In these Lecture Notes, we provide a comprehensive introduction to the most recent
advances in the application of machine learning methods in quantum sciences. We cover …
advances in the application of machine learning methods in quantum sciences. We cover …
Sinkformers: Transformers with doubly stochastic attention
Attention based models such as Transformers involve pairwise interactions between data
points, modeled with a learnable attention matrix. Importantly, this attention matrix is …
points, modeled with a learnable attention matrix. Importantly, this attention matrix is …