- Academic Search

Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo… - ACM Computing …, 2024‏ - dl.acm.org‏

Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …‏

שמור צטט צוטט על ידי 23 מאמרים בנושא זה כל 5 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

{AlpaServe}: Statistical multiplexing with model parallelism for deep learning serving‏

Z Li, L Zheng, Y Zhong, V Liu, Y Sheng, X **… - … USENIX Symposium on …, 2023‏ - usenix.org‏

Model parallelism is conventionally viewed as a method to scale a single large deep
learning model beyond the memory limits of a single device. In this paper, we demonstrate …‏

שמור צטט צוטט על ידי 141 מאמרים בנושא זה כל 7 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards efficient generative large language model serving: A survey from algorithms to systems‏

X Miao, G Oliaro, Z Zhang, X Cheng, H **… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …‏

שמור צטט צוטט על ידי 77 מאמרים בנושא זה כל 2 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

Microsecond-scale preemption for concurrent {GPU-accelerated}{DNN} inferences‏

M Han, H Zhang, R Chen, H Chen - 16th USENIX Symposium on …, 2022‏ - usenix.org‏

Many intelligent applications like autonomous driving and virtual reality require running both
latency-critical and best-effort DNN inference tasks to achieve both real time and work …‏

שמור צטט צוטט על ידי 120 מאמרים בנושא זה כל 3 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Fast distributed inference serving for large language models‏

B Wu, Y Zhong, Z Zhang, S Liu, F Liu, Y Sun… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Large language models (LLMs) power a new generation of interactive AI applications
exemplified by ChatGPT. The interactive nature of these applications demands low latency …‏

שמור צטט צוטט על ידי 88 מאמרים בנושא זה כל 3 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

Beware of fragmentation: Scheduling {GPU-Sharing} workloads with fragmentation gradient descent‏

Q Weng, L Yang, Y Yu, W Wang, X Tang… - 2023 USENIX Annual …, 2023‏ - usenix.org‏

Large tech companies are piling up a massive number of GPUs in their server fleets to run
diverse machine learning (ML) workloads. However, these expensive devices often suffer …‏

שמור צטט צוטט על ידי 46 מאמרים בנושא זה כל 10 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Llmcad: Fast and scalable on-device large language model inference‏

D Xu, W Yin, X **, Y Zhang, S Wei, M Xu… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Generative tasks, such as text generation and question answering, hold a crucial position in
the realm of mobile applications. Due to their sensitivity to privacy concerns, there is a …‏

שמור צטט צוטט על ידי 51 מאמרים בנושא זה כל 2 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Orion: Interference-aware, fine-grained GPU sharing for ML applications‏

F Strati, X Ma, A Klimovic - … of the Nineteenth European Conference on …, 2024‏ - dl.acm.org‏

GPUs are critical for maximizing the throughput-per-Watt of deep neural network (DNN)
applications. However, DNN applications often underutilize GPUs, even when using large …‏

שמור צטט צוטט על ידי 33 מאמרים בנושא זה כל 4 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

Power-aware Deep Learning Model Serving with {μ-Serve}‏

H Qiu, W Mao, A Patke, S Cui, S Jha, C Wang… - 2024 USENIX Annual …, 2024‏ - usenix.org‏

With the increasing popularity of large deep learning model-serving workloads, there is a
pressing need to reduce the energy consumption of a model-serving cluster while …‏

שמור צטט צוטט על ידי 13 מאמרים בנושא זה כל 7 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

Transparent {GPU} sharing in container clouds for deep learning workloads‏

B Wu, Z Zhang, Z Bai, X Liu, X ** - 20th USENIX Symposium on …, 2023‏ - usenix.org‏

Containers are widely used for resource management in datacenters. A common practice to
support deep learning (DL) training in container clouds is to statically bind GPUs to …‏

שמור צטט צוטט על ידי 45 מאמרים בנושא זה כל 4 הגרסאות פתיחה בתור HTML

יצירת התראה

צטט

חיפוש מתקדם

נשמר בספרייה שלי

{PipeSwitch}: Fast pipelined context switching for deep learning applications

Deep learning workload scheduling in gpu datacenters: A survey‏

{AlpaServe}: Statistical multiplexing with model parallelism for deep learning serving‏

Towards efficient generative large language model serving: A survey from algorithms to systems‏

Microsecond-scale preemption for concurrent {GPU-accelerated}{DNN} inferences‏

Fast distributed inference serving for large language models‏

Beware of fragmentation: Scheduling {GPU-Sharing} workloads with fragmentation gradient descent‏

Llmcad: Fast and scalable on-device large language model inference‏

Orion: Interference-aware, fine-grained GPU sharing for ML applications‏

Power-aware Deep Learning Model Serving with {μ-Serve}‏

Transparent {GPU} sharing in container clouds for deep learning workloads‏