Tabi: An efficient multi-level inference system for large language models

Y Wang, K Chen, H Tan, K Guo - Proceedings of the Eighteenth …, 2023 - dl.acm.org
Today's trend of building ever larger language models (LLMs), while pushing the
performance of natural language processing, adds significant latency to the inference stage …

Lcs: Alleviating total cold start latency in serverless applications with lru warm container approach

B Sethi, SK Addya, SK Ghosh - … of the 24th International Conference on …, 2023 - dl.acm.org
Serverless computing offers" Function-as-a-Service"(FaaS), which promotes an application
in the form of independent granular components called functions. FaaS goes well as a …

Performance and cost comparison of cloud services for deep learning workload

D Chahal, M Mishra, S Palepu, R Singhal - Companion of the ACM …, 2021 - dl.acm.org
Many organizations are migrating their on-premise artificial intelligence workloads to the
cloud due to the availability of cost-effective and highly scalable infrastructure, software and …

Ship** code towards data in an inter-region serverless environment to leverage latency

B Sethi, SK Addya, J Bhutada, SK Ghosh - The Journal of Supercomputing, 2023 - Springer
Serverless computing emerges as a new standard to build cloud applications, where
developers write compact functions that respond to events in the cloud infrastructure …