Feature-learning networks are consistent across widths at realistic scales N Vyas, A Atanasov, B Bordelon, D Morwani, S Sainathan, C Pehlevan Advances in Neural Information Processing Systems 36, 2024 | 25 | 2024 |
Simplicity bias in 1-hidden layer neural networks D Morwani, J Batra, P Jain, P Netrapalli Advances in Neural Information Processing Systems 36, 2024 | 13 | 2024 |
Feature emergence via margin maximization: case studies in algebraic tasks D Morwani, BL Edelman, CA Oncescu, R Zhao, S Kakade arXiv preprint arXiv:2311.07568, 2023 | 12 | 2023 |
Soap: Improving and stabilizing shampoo using adam N Vyas, D Morwani, R Zhao, I Shapira, D Brandfonbrener, L Janson, ... arXiv preprint arXiv:2409.11321, 2024 | 11 | 2024 |
Deconstructing what makes a good optimizer for language models R Zhao, D Morwani, D Brandfonbrener, N Vyas, S Kakade arXiv preprint arXiv:2407.07972, 2024 | 10 | 2024 |
Inductive bias of gradient descent for weight normalized smooth homogeneous neural nets D Morwani, HG Ramaswamy International Conference on Algorithmic Learning Theory, 827-880, 2022 | 6 | 2022 |
A New Perspective on Shampoo's Preconditioner D Morwani, I Shapira, N Vyas, E Malach, S Kakade, L Janson arXiv preprint arXiv:2406.17748, 2024 | 5 | 2024 |
Beyond implicit bias: The insignificance of sgd noise in online learning N Vyas, D Morwani, R Zhao, G Kaplun, S Kakade, B Barak arXiv preprint arXiv:2306.08590, 2023 | 2 | 2023 |
Using noise resilience for ranking generalization of deep neural networks D Morwani, R Vashisht, HG Ramaswamy arXiv preprint arXiv:2012.08854, 2020 | 2 | 2020 |
AdaMeM: Memory Efficient Momentum for Adafactor N Vyas, D Morwani, SM Kakade 2nd Workshop on Advancing Neural Network Training: Computational Efficiency …, 0 | 2 | |
How Does Critical Batch Size Scale in Pre-training? H Zhang, D Morwani, N Vyas, J Wu, D Zou, U Ghai, D Foster, S Kakade arXiv preprint arXiv:2410.21676, 2024 | 1 | 2024 |
Inductive bias of gradient descent for exponentially weight normalized smooth homogeneous neural nets D Morwani, HG Ramaswamy arXiv preprint arXiv:2010.12909, 2020 | 1 | 2020 |
Connections between Schedule-Free SGD, Accelerated SGD Variants, and Weight Averaging D Morwani, N Vyas, H Zhang, SM Kakade OPT 2024: Optimization for Machine Learning, 0 | | |