Galore: Memory-efficient llm training by gradient low-rank projection

J Zhao, Z Zhang, B Chen, Z Wang… - ar** is provably robust to label noise for overparameterized neural networks
M Li, M Soltanolkotabi, S Oymak - … conference on artificial …, 2020 - proceedings.mlr.press
Modern neural networks are typically trained in an over-parameterized regime where the
parameters of the model far exceed the size of the training data. Such neural networks in …

Accelerating dataset distillation via model augmentation

L Zhang, J Zhang, B Lei, S Mukherjee… - Proceedings of the …, 2023 - openaccess.thecvf.com
Dataset Distillation (DD), a newly emerging field, aims at generating much smaller but
efficient synthetic training datasets from large ones. Existing DD methods based on gradient …

An investigation into neural net optimization via hessian eigenvalue density

B Ghorbani, S Krishnan, Y ** in private sgd: A geometric perspective
X Chen, SZ Wu, M Hong - Advances in Neural Information …, 2020 - proceedings.neurips.cc
Deep learning models are increasingly popular in many machine learning applications
where the training data may contain sensitive information. To provide formal and rigorous …