Infercept: Efficient intercept support for augmented large language model inference R Abhyankar, Z He, V Srivatsa, H Zhang, Y Zhang arXiv preprint arXiv:2402.01869, 2024 | 12 | 2024 |
Apiserve: Efficient api support for large-language model inferencing R Abhyankar, Z He, V Srivatsa, H Zhang, Y Zhang arXiv e-prints, arXiv: 2402.01869, 2024 | 7 | 2024 |
The effect of model size on worst-group generalization A Pham, E Chan, V Srivatsa, D Ghosh, Y Yang, Y Yu, R Zhong, ... arXiv preprint arXiv:2112.04094, 2021 | 5 | 2021 |
Preble: Efficient distributed prompt scheduling for llm serving V Srivatsa, Z He, R Abhyankar, D Li, Y Zhang arXiv preprint arXiv:2407.00023, 2024 | 4 | 2024 |
Cognify: Supercharging Gen-AI Workflows With Hierarchical Autotuning Z He, R Abhyankar, V Srivatsa, Y Zhang arXiv preprint arXiv:2502.08056, 2025 | | 2025 |