Scalable iterative pruning of large language and vision models using block coordinate descent
Pruning neural networks, which involves removing a fraction of their weights, can often
maintain high accuracy while significantly reducing model complexity, at least up to a certain …
maintain high accuracy while significantly reducing model complexity, at least up to a certain …