Google Académico

Y Wang, K Chen, H Tan, K Guo - Proceedings of the Eighteenth …, 2023 - dl.acm.org

Today's trend of building ever larger language models (LLMs), while pushing the
performance of natural language processing, adds significant latency to the inference stage …

Guardar Citar Citado por 74 Artículos relacionados Las 2 versiones

[Free GPT-4]

[PDF] acm.org

Spotserve: Serving generative large language models on preemptible instances

X Miao, C Shi, J Duan, X ** application logic without worrying about the underlying …

Guardar Citar Citado por 26 Artículos relacionados Las 2 versiones

Lcs: Alleviating total cold start latency in serverless applications with lru warm container approach

B Sethi, SK Addya, SK Ghosh - … of the 24th International Conference on …, 2023 - dl.acm.org

Serverless computing offers" Function-as-a-Service"(FaaS), which promotes an application
in the form of independent granular components called functions. FaaS goes well as a …

Guardar Citar Citado por 16 Artículos relacionados

[Free GPT-4]

[PDF] spec.org

Performance and cost comparison of cloud services for deep learning workload

D Chahal, M Mishra, S Palepu, R Singhal - Companion of the ACM …, 2021 - dl.acm.org

Many organizations are migrating their on-premise artificial intelligence workloads to the
cloud due to the availability of cost-effective and highly scalable infrastructure, software and …

Guardar Citar Citado por 21 Artículos relacionados Las 2 versiones

Ship** code towards data in an inter-region serverless environment to leverage latency

B Sethi, SK Addya, J Bhutada, SK Ghosh - The Journal of Supercomputing, 2023 - Springer

Serverless computing emerges as a new standard to build cloud applications, where
developers write compact functions that respond to events in the cloud infrastructure …

Guardar Citar Citado por 7 Artículos relacionados Las 3 versiones

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Optimizing prediction serving on low-latency serverless dataflow

Tabi: An efficient multi-level inference system for large language models

Spotserve: Serving generative large language models on preemptible instances

Lcs: Alleviating total cold start latency in serverless applications with lru warm container approach

Performance and cost comparison of cloud services for deep learning workload

Ship** code towards data in an inter-region serverless environment to leverage latency