Gptq: Accurate post-training quantization for generative pre-trained transformers E Frantar, S Ashkboos, T Hoefler, D Alistarh arXiv preprint arXiv:2210.17323, 2022 | 1066 | 2022 |
Sparsegpt: Massive language models can be accurately pruned in one-shot E Frantar, D Alistarh International Conference on Machine Learning, 10323-10337, 2023 | 575 | 2023 |
Optimal brain compression: A framework for accurate post-training quantization and pruning E Frantar, D Alistarh Advances in Neural Information Processing Systems 35, 4475-4488, 2022 | 225 | 2022 |
Spqr: A sparse-quantized representation for near-lossless llm weight compression T Dettmers, R Svirschevski, V Egiazarian, D Kuznedelev, E Frantar, ... arXiv preprint arXiv:2306.03078, 2023 | 212 | 2023 |
The optimal bert surgeon: Scalable and accurate second-order pruning for large language models E Kurtic, D Campos, T Nguyen, E Frantar, M Kurtz, B Fineran, M Goin, ... arXiv preprint arXiv:2203.07259, 2022 | 134 | 2022 |
Extreme compression of large language models via additive quantization V Egiazarian, A Panferov, D Kuznedelev, E Frantar, A Babenko, D Alistarh arXiv preprint arXiv:2401.06118, 2024 | 66 | 2024 |
M-FAC: Efficient matrix-free approximations of second-order information E Frantar, E Kurtic, D Alistarh Advances in Neural Information Processing Systems 34, 14873-14886, 2021 | 60 | 2021 |
Ziplm: Hardware-aware structured pruning of language models E Kurtic, E Frantar, D Alistarh arXiv preprint arXiv:2302.04089 12, 2023 | 56* | 2023 |
SPDY: Accurate pruning with speedup guarantees E Frantar, D Alistarh International conference on machine learning, 6726-6743, 2022 | 43 | 2022 |
Quik: Towards end-to-end 4-bit inference on generative large language models S Ashkboos, I Markov, E Frantar, T Zhong, X Wang, J Ren, T Hoefler, ... arXiv preprint arXiv:2310.09259, 2023 | 27 | 2023 |
On the sample complexity of adversarial multi-source pac learning N Konstantinov, E Frantar, D Alistarh, C Lampert International Conference on Machine Learning, 5416-5425, 2020 | 24 | 2020 |
QMoE: Sub-1-Bit Compression of Trillion Parameter Models E Frantar, D Alistarh Proceedings of Machine Learning and Systems 6, 439-451, 2024 | 23* | 2024 |
Marlin: a fast 4-bit inference kernel for medium batchsizes E Frantar, D Alistarh | 22* | 2024 |
Scaling laws for sparsely-connected foundation models E Frantar, C Riquelme, N Houlsby, D Alistarh, U Evci arXiv preprint arXiv:2309.08520, 2023 | 21 | 2023 |
Cap: Correlation-aware pruning for highly-accurate sparse vision models D Kuznedelev, E Kurtić, E Frantar, D Alistarh Advances in Neural Information Processing Systems 36, 28805-28831, 2023 | 16* | 2023 |
Sparse fine-tuning for inference acceleration of large language models E Kurtic, D Kuznedelev, E Frantar, M Goin, D Alistarh arXiv preprint arXiv:2310.06927, 2023 | 16 | 2023 |
L-GreCo: Layerwise-adaptive Gradient Compression For Efficient Data-parallel Deep Learning I Markov, K Alim, E Frantar, D Alistarh Proceedings of Machine Learning and Systems 6, 312-324, 2024 | 8* | 2024 |
Accurate neural network pruning requires rethinking sparse optimization D Kuznedelev, E Kurtic, E Iofinova, E Frantar, A Peste, D Alistarh arXiv preprint arXiv:2308.02060, 2023 | 8 | 2023 |
Qigen: Generating efficient kernels for quantized inference on large language models T Pegolotti, E Frantar, D Alistarh, M Püschel arXiv preprint arXiv:2307.03738, 2023 | 8* | 2023 |
JaxPruner: A concise library for sparsity research JH Lee, W Park, NE Mitchell, J Pilault, JSO Ceron, HB Kim, N Lee, ... Conference on Parsimony and Learning, 515-528, 2024 | 6 | 2024 |