フォロー
Depen Morwani
Depen Morwani
確認したメール アドレス: g.harvard.edu - ホームページ
タイトル
引用先
引用先
Feature-learning networks are consistent across widths at realistic scales
N Vyas, A Atanasov, B Bordelon, D Morwani, S Sainathan, C Pehlevan
Advances in Neural Information Processing Systems 36, 2024
252024
Simplicity bias in 1-hidden layer neural networks
D Morwani, J Batra, P Jain, P Netrapalli
Advances in Neural Information Processing Systems 36, 2024
132024
Feature emergence via margin maximization: case studies in algebraic tasks
D Morwani, BL Edelman, CA Oncescu, R Zhao, S Kakade
arXiv preprint arXiv:2311.07568, 2023
122023
Soap: Improving and stabilizing shampoo using adam
N Vyas, D Morwani, R Zhao, I Shapira, D Brandfonbrener, L Janson, ...
arXiv preprint arXiv:2409.11321, 2024
112024
Deconstructing what makes a good optimizer for language models
R Zhao, D Morwani, D Brandfonbrener, N Vyas, S Kakade
arXiv preprint arXiv:2407.07972, 2024
102024
Inductive bias of gradient descent for weight normalized smooth homogeneous neural nets
D Morwani, HG Ramaswamy
International Conference on Algorithmic Learning Theory, 827-880, 2022
62022
A New Perspective on Shampoo's Preconditioner
D Morwani, I Shapira, N Vyas, E Malach, S Kakade, L Janson
arXiv preprint arXiv:2406.17748, 2024
52024
Beyond implicit bias: The insignificance of sgd noise in online learning
N Vyas, D Morwani, R Zhao, G Kaplun, S Kakade, B Barak
arXiv preprint arXiv:2306.08590, 2023
22023
Using noise resilience for ranking generalization of deep neural networks
D Morwani, R Vashisht, HG Ramaswamy
arXiv preprint arXiv:2012.08854, 2020
22020
AdaMeM: Memory Efficient Momentum for Adafactor
N Vyas, D Morwani, SM Kakade
2nd Workshop on Advancing Neural Network Training: Computational Efficiency …, 0
2
How Does Critical Batch Size Scale in Pre-training?
H Zhang, D Morwani, N Vyas, J Wu, D Zou, U Ghai, D Foster, S Kakade
arXiv preprint arXiv:2410.21676, 2024
12024
Inductive bias of gradient descent for exponentially weight normalized smooth homogeneous neural nets
D Morwani, HG Ramaswamy
arXiv preprint arXiv:2010.12909, 2020
12020
Connections between Schedule-Free SGD, Accelerated SGD Variants, and Weight Averaging
D Morwani, N Vyas, H Zhang, SM Kakade
OPT 2024: Optimization for Machine Learning, 0
現在システムで処理を実行できません。しばらくしてからもう一度お試しください。
論文 1–13