Folgen
Amey Agrawal
Amey Agrawal
PhD Student at Georgia Tech
Bestätigte E-Mail-Adresse bei gatech.edu - Startseite
Titel
Zitiert von
Zitiert von
Jahr
Taming {Throughput-Latency} Tradeoff in {LLM} Inference with {Sarathi-Serve}
A Agrawal, N Kedia, A Panwar, J Mohan, N Kwatra, B Gulavani, ...
18th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2024
912024
Sarathi: Efficient llm inference by piggybacking decodes with chunked prefills
A Agrawal, A Panwar, J Mohan, N Kwatra, BS Gulavani, R Ramjee
arXiv preprint arXiv:2308.16369, 2023
792023
Singularity: Planet-scale, preemptive and elastic scheduling of AI workloads
D Shukla, M Sivathanu, S Viswanatha, B Gulavani, R Nehme, A Agrawal, ...
arXiv preprint arXiv:2202.07848, 2022
332022
Logan: A distributed online log parser
A Agrawal, R Karlupia, R Gupta
2019 IEEE 35th International Conference on Data Engineering (ICDE), 1946-1951, 2019
302019
Vidur: A Large-Scale Simulation Framework For LLM Inference
A Agrawal, N Kedia, J Mohan, A Panwar, N Kwatra, B Gulavani, ...
Proceedings of Machine Learning and Systems 6, 351-366, 2024
222024
Delog: A high-performance privacy preserving log filtering framework
A Agrawal, A Dixit, NA Shettar, D Kapadia, V Agrawal, R Gupta, ...
2019 IEEE International Conference on Big Data (Big Data), 1739-1748, 2019
8*2019
Etalon: Holistic Performance Evaluation Framework for LLM Inference Systems
A Agrawal, A Agarwal, N Kedia, J Mohan, S Kundu, N Kwatra, R Ramjee, ...
arXiv preprint arXiv:2407.07000, 2024
52024
Inshrinkerator: Compressing Deep Learning Training Checkpoints via Dynamic Quantization
A Agrawal, S Reddy, S Bhattamishra, VPS Nookala, V Vashishth, K Rong, ...
Proceedings of the 2024 ACM Symposium on Cloud Computing, 1012-1031, 2024
3*2024
Mnemosyne: Parallelization strategies for efficiently serving multi-million context length llm inference requests without approximations
A Agrawal, J Chen, Í Goiri, R Ramjee, C Zhang, A Tumanov, E Choukse
arXiv preprint arXiv:2409.17264, 2024
32024
Elastically managing workers of multi-worker workloads on accelerator devices
M Sivathanu, S Viswanatha, B Gulavani, DK Shukla, RV Nehme, ...
US Patent App. 17/855,722, 2023
2023
Learning Digital Circuits: A Journey Through Weight Invariant Self-Pruning Neural Networks
A Agrawal, R Karlupia
arXiv preprint arXiv:1909.00052, 2019
2019
Das System kann den Vorgang jetzt nicht ausführen. Versuchen Sie es später erneut.
Artikel 1–11