Følg
Johannes von Oswald
Johannes von Oswald
Research Scientist, Google
Verificeret mail på google.com - Startside
Titel
Citeret af
Citeret af
År
Transformers learn in-context by gradient descent
J Von Oswald, E Niklasson, E Randazzo, J Sacramento, A Mordvintsev, ...
ICML 2023 (Oral), 2023
4592023
Continual learning with hypernetworks
J von Oswald, C Henning, BF Grewe, J Sacramento
ICLR 2020 (Spotlight), 2019
4452019
Posterior meta-replay for continual learning
C Henning, M Cervera, F D'Angelo, J Von Oswald, R Traber, B Ehret, ...
NeurIPS 2021, 2021
732021
Continual learning in recurrent neural networks
B Ehret, C Henning, MR Cervera, A Meulemans, J Von Oswald, BF Grewe
ICLR 2021, 2020
61*2020
Uncovering mesa-optimization algorithms in transformers
J Von Oswald, M Schlegel, A Meulemans, S Kobayashi, E Niklasson, ...
arXiv preprint arXiv:2309.05858, 2023
60*2023
Learning where to learn: Gradient sparsity in meta and continual learning
J Von Oswald, D Zhao, S Kobayashi, S Schug, M Caccia, N Zucchet, ...
NeurIPS 2021, 2021
602021
Meta-Learning via Hypernetworks
D Zhao, S Kobayashi, J Sacramento, J von Oswald
4th Workshop on Meta-Learning at NeurIPS 2020, Vancouver, Canada, 2020
592020
Neural networks with late-phase weights
J von Oswald, S Kobayashi, A Meulemans, C Henning, BF Grewe, ...
ICLR 2021, 2020
382020
The least-control principle for local learning at equilibrium
A Meulemans, N Zucchet, S Kobayashi, J Von Oswald, J Sacramento
NeurIPS 2022 (Oral), 2022
272022
Random initialisations performing above chance and how to find them
F Benzing, S Schug, R Meier, J Von Oswald, Y Akram, N Zucchet, ...
OPT2022: 14th Annual Workshop on Optimization for Machine Learning, 2022
252022
Approximating the predictive distribution via adversarially-trained hypernetworks
C Henning, J von Oswald, J Sacramento, SC Surace, JP Pfister, ...
Bayesian Deep Learning Workshop, NeurIPS 2018 (Spotlight), 2018
252018
A contrastive rule for meta-learning
N Zucchet, S Schug, J Von Oswald, D Zhao, J Sacramento
NeurIPS 2022, 2022
242022
Discovering modular solutions that generalize compositionally
S Schug, S Kobayashi, Y Akram, M Wołczyk, A Proca, J Von Oswald, ...
ICLR 2024, 2023
122023
Gated recurrent neural networks discover attention
N Zucchet, S Kobayashi, Y Akram, J Von Oswald, M Larcher, A Steger, ...
arXiv preprint arXiv:2309.01775, 2023
102023
On the reversed bias-variance tradeoff in deep ensembles
S Kobayashi, J von Oswald, BF Grewe
ICML 2021 Workshop on Uncertainty and Robustness in Deep Learning, 2021
92021
Adversarial robustness of in-context learning in transformers for linear regression
U Anwar, J Von Oswald, L Kirsch, D Krueger, S Frei
arXiv preprint arXiv:2411.05189, 2024
42024
Linear Transformers are Versatile In-Context Learners
M Vladymyrov, J Von Oswald, M Sandler, R Ge
NeurIPS 2024, 2024
32024
Weight decay induces low-rank attention layers
S Kobayashi, Y Akram, J Von Oswald
NeurIPS 2024, 2024
22024
When can transformers compositionally generalize in-context?
S Kobayashi, S Schug, Y Akram, F Redhardt, J von Oswald, R Pascanu, ...
ICML 2024 Next Generation of Sequence Modeling Architectures Workshop, 2024
12024
Disentangling the predictive variance of deep ensembles through the neural tangent kernel
S Kobayashi, P Vilimelis Aceituno, J Von Oswald
NeurIPS 2022, 2022
12022
Systemet kan ikke foretage handlingen nu. Prøv igen senere.
Artikler 1–20