Deep learning on a data diet: Finding important examples early in training M Paul, S Ganguli, GK Dziugaite
Advances in neural information processing systems 34, 20596-20607, 2021
462 2021 Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the neural tangent kernel S Fort, GK Dziugaite, M Paul, S Kharaghani, DM Roy, S Ganguli
Advances in Neural Information Processing Systems 33, 5850-5861, 2020
207 2020 Lora learns less and forgets less D Biderman, J Portes, JJG Ortiz, M Paul, P Greengard, C Jennings, ...
Transactions on Machine Learning Research, 2024
104 2024 Pretraining task diversity and the emergence of non-bayesian in-context learning for regression A Raventós, M Paul, F Chen, S Ganguli
Advances in neural information processing systems 36, 14228-14246, 2023
77 2023 Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning Ticket's Mask? M Paul, F Chen, BW Larsen, J Frankle, S Ganguli, GK Dziugaite
arXiv preprint arXiv:2210.03044, 2022
46 2022 Scaling laws for precision T Kumar, Z Ankner, BF Spector, B Bordelon, N Muennighoff, M Paul, ...
arXiv preprint arXiv:2411.04330, 2024
20 2024 Critique-out-loud reward models Z Ankner, M Paul, B Cui, JD Chang, P Ammanabrolu
arXiv preprint arXiv:2408.11791, 2024
19 2024 Perplexed by perplexity: Perplexity-based data pruning with small reference models Z Ankner, C Blakeney, K Sreenivasan, M Marion, ML Leavitt, M Paul
arXiv preprint arXiv:2405.20541, 2024
19 2024 Lottery tickets on a data diet: Finding initializations with sparse trainable networks M Paul, B Larsen, S Ganguli, J Frankle, GK Dziugaite
Advances in Neural Information Processing Systems 35, 18916-18928, 2022
18 2022 Does your data spark joy? Performance gains from domain upsampling at the end of training C Blakeney, M Paul, BW Larsen, S Owen, J Frankle
arXiv preprint arXiv:2406.03476, 2024
8 2024 The effects of pretraining task diversity on in-context learning of ridge regression A Raventos, M Paul, F Chen, S Ganguli
ICLR 2023 Workshop on Mathematical and Empirical Understanding of Foundation …, 2023
6 2023 Perplexed by perplexity: Perplexity-based pruning with small reference models Z Ankner, C Blakeney, K Sreenivasan, M Marion, ML Leavitt, M Paul
ICLR 2024 Workshop on Mathematical and Empirical Understanding of Foundation …, 2024
2 2024 Predicting task forgetting in large language models A Kleiman, J Frankle, SM Kakade, M Paul
2 2023 Unmasking the lottery ticket hypothesis: Efficient adaptive pruning for finding winning tickets M Paul, F Chen, BW Larsen, J Frankle, S Ganguli, GK Dziugaite
Has it Trained Yet? NeurIPS 2022 Workshop, 2022
2 2022 Pre-Training on a Data Diet: Identifying Sufficient Examples for Early Training M Paul, BW Larsen, S Ganguli, J Frankle, GK Dziugaite
First Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward at …, 2022
1 2022 nit Scaling: Simple and Scalable FP8 LLM TrainingS Narayan, A Gupta, M Paul, D Blalock
arXiv preprint arXiv:2502.05967, 2025
2025 Soup to go: mitigating forgetting during continual learning with model averaging A Kleiman, GK Dziugaite, J Frankle, S Kakade, M Paul
arXiv preprint arXiv:2501.05559, 2025
2025 Deep Learning on a Diet: An Error Landscape Perspective on Parameter and Data Efficiency in Deep Learning M Paul
Stanford University, 2023
2023