Google Académico

Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo… - ACM Computing …, 2024 - dl.acm.org

Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …

Guardar Citar Citado por 23 Artículos relacionados Las 4 versiones

[Free GPT-4]

[PDF] usenix.org

{AlpaServe}: Statistical multiplexing with model parallelism for deep learning serving

Z Li, L Zheng, Y Zhong, V Liu, Y Sheng, X **… - … USENIX Symposium on …, 2023 - usenix.org

Model parallelism is conventionally viewed as a method to scale a single large deep
learning model beyond the memory limits of a single device. In this paper, we demonstrate …

Guardar Citar Citado por 135 Artículos relacionados Las 4 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Fast distributed inference serving for large language models

B Wu, Y Zhong, Z Zhang, S Liu, F Liu, Y Sun… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models (LLMs) power a new generation of interactive AI applications
exemplified by ChatGPT. The interactive nature of these applications demands low latency …

Guardar Citar Citado por 80 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Towards efficient generative large language model serving: A survey from algorithms to systems

X Miao, G Oliaro, Z Zhang, X Cheng, H **… - arxiv preprint arxiv …, 2023 - arxiv.org

In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …

Guardar Citar Citado por 72 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]

[PDF] usenix.org

Microsecond-scale preemption for concurrent {GPU-accelerated}{DNN} inferences

M Han, H Zhang, R Chen, H Chen - 16th USENIX Symposium on …, 2022 - usenix.org

Many intelligent applications like autonomous driving and virtual reality require running both
latency-critical and best-effort DNN inference tasks to achieve both real time and work …

Guardar Citar Citado por 117 Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Defending batch-level label inference and replacement attacks in vertical federated learning

T Zou, Y Liu, Y Kang, W Liu, Y He, Z Yi… - … Transactions on Big …, 2022 - ieeexplore.ieee.org

In a vertical federated learning (VFL) scenario where features and models are split into
different parties, it has been shown that sample-level gradient information can be exploited …

Guardar Citar Citado por 74 Artículos relacionados Las 6 versiones

[Free GPT-4]

[PDF] usenix.org

Power-aware Deep Learning Model Serving with {μ-Serve}

H Qiu, W Mao, A Patke, S Cui, S Jha, C Wang… - 2024 USENIX Annual …, 2024 - usenix.org

With the increasing popularity of large deep learning model-serving workloads, there is a
pressing need to reduce the energy consumption of a model-serving cluster while …

Guardar Citar Citado por 11 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]

[PDF] ieee.org

Liquid: Intelligent resource estimation and network-efficient scheduling for deep learning jobs on distributed GPU clusters

R Gu, Y Chen, S Liu, H Dai, G Chen… - … on Parallel and …, 2021 - ieeexplore.ieee.org

Deep learning (DL) is becoming increasingly popular in many domains, including computer
vision, speech recognition, self-driving automobiles, etc. GPU can train DL models efficiently …

Guardar Citar Citado por 71 Artículos relacionados Las 4 versiones

[Free GPT-4]

[PDF] usenix.org

Bamboo: Making preemptible instances resilient for affordable training of large {DNNs}

J Thorpe, P Zhao, J Eyolfson, Y Qiao, Z Jia… - … USENIX Symposium on …, 2023 - usenix.org

DNN models across many domains continue to grow in size, resulting in high resource
requirements for effective training, and unpalatable (and often unaffordable) costs for …

Guardar Citar Citado por 63 Artículos relacionados Las 14 versiones Versión en HTML

[Free GPT-4]

[PDF] usenix.org

Transparent {GPU} sharing in container clouds for deep learning workloads

B Wu, Z Zhang, Z Bai, X Liu, X ** - 20th USENIX Symposium on …, 2023 - usenix.org

Containers are widely used for resource management in datacenters. A common practice to
support deep learning (DL) training in container clouds is to statically bind GPUs to …

Guardar Citar Citado por 43 Artículos relacionados Las 5 versiones Versión en HTML

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

{PipeSwitch}: Fast pipelined context switching for deep learning applications

Deep learning workload scheduling in gpu datacenters: A survey

{AlpaServe}: Statistical multiplexing with model parallelism for deep learning serving

Fast distributed inference serving for large language models

Towards efficient generative large language model serving: A survey from algorithms to systems

Microsecond-scale preemption for concurrent {GPU-accelerated}{DNN} inferences

Defending batch-level label inference and replacement attacks in vertical federated learning

Power-aware Deep Learning Model Serving with {μ-Serve}

Liquid: Intelligent resource estimation and network-efficient scheduling for deep learning jobs on distributed GPU clusters

Bamboo: Making preemptible instances resilient for affordable training of large {DNNs}

Transparent {GPU} sharing in container clouds for deep learning workloads