Toward Effective Neural Architectures and Algorithms for Generalizable Deep Learning

M Li - 2024 - deepblue.lib.umich.edu
This thesis explores the complexities of overparameterization in neural networks, where
models with a large number of parameters have the potential to quickly fit and generalize …

[PDF][PDF] Composite Attention: A Framework for Combining Sequence Mixing Primitives

HJ Cunningham, MP Deisenroth - hjakecunningham.github.io
Hybrid attention architectures have shown promising success in both equip** self
attention with inductive bias for long-sequence modelling and reducing the computational …