الباحث العلمي من Google

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …‏

حفظ اقتباس تم اقتباسها في عدد: 3629 مقالات ذات صلة الإصدارات الـ 4كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Understanding llms: A comprehensive overview from training to inference‏

Y Liu, H He, T Han, X Zhang, M Liu, J Tian… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

The introduction of ChatGPT has led to a significant increase in the utilization of Large
Language Models (LLMs) for addressing downstream tasks. There's an increasing focus on …‏

حفظ اقتباس تم اقتباسها في عدد: 80 مقالات ذات صلة الإصدارات الـ 4كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Powerinfer: Fast large language model serving with a consumer-grade gpu‏

Y Song, Z Mi, H **e, H Chen - Proceedings of the ACM SIGOPS 30th …, 2024‏ - dl.acm.org‏

This paper introduces PowerInfer, a high-speed Large Language Model (LLM) inference
engine on a personal computer (PC) equipped with a single consumer-grade GPU. The key …‏

حفظ اقتباس تم اقتباسها في عدد: 86 مقالات ذات صلة الإصدارات الـ 3كلها

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Efficient large language models: A survey‏

Z Wan, X Wang, C Liu, S Alam, Y Zheng, J Liu… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Large Language Models (LLMs) have demonstrated remarkable capabilities in important
tasks such as natural language understanding and language generation, and thus have the …‏

حفظ اقتباس تم اقتباسها في عدد: 129 مقالات ذات صلة الإصدارات الـ 7كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Unlocking efficiency in large language model inference: A comprehensive survey of speculative decoding‏

H **a, Z Yang, Q Dong, P Wang, Y Li, T Ge… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

To mitigate the high inference latency stemming from autoregressive decoding in Large
Language Models (LLMs), Speculative Decoding has emerged as a novel decoding …‏

حفظ اقتباس تم اقتباسها في عدد: 70 مقالات ذات صلة الإصدارات الـ 4كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

Fairness in serving large language models‏

Y Sheng, S Cao, D Li, B Zhu, Z Li, D Zhuo… - … USENIX Symposium on …, 2024‏ - usenix.org‏

High-demand LLM inference services (eg, ChatGPT and BARD) support a wide range of
requests from short chat conversations to long document reading. To ensure that all client …‏

حفظ اقتباس تم اقتباسها في عدد: 39 مقالات ذات صلة الإصدارات الـ 3كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] kuleuven.be

[PDF][PDF] Skeleton-of-thought: Large language models can do parallel decoding‏

X Ning, Z Lin, Z Zhou, Z Wang, H Yang… - Proceedings ENLSP …, 2023‏ - lirias.kuleuven.be‏

This work aims at decreasing the end-to-end generation latency of large language models
(LLMs). One of the major causes of the high generation latency is the sequential decoding …‏

حفظ اقتباس تم اقتباسها في عدد: 70 مقالات ذات صلة الإصدارات الـ 6كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

{dLoRA}: Dynamically orchestrating requests and adapters for {LoRA}{LLM} serving‏

B Wu, R Zhu, Z Zhang, P Sun, X Liu, X ** - 18th USENIX Symposium on …, 2024‏ - usenix.org‏

Low-rank adaptation (LoRA) is a popular approach to finetune pre-trained large language
models (LLMs) to specific domains. This paper introduces dLoRA, an inference serving …‏

حفظ اقتباس تم اقتباسها في عدد: 12 مقالات ذات صلة إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Medusa: Simple llm inference acceleration framework with multiple decoding heads‏

T Cai, Y Li, Z Geng, H Peng, JD Lee, D Chen… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

The inference process in Large Language Models (LLMs) is often limited due to the absence
of parallelism in the auto-regressive decoding process, resulting in most operations being …‏

حفظ اقتباس تم اقتباسها في عدد: 165 مقالات ذات صلة الإصدارات الـ 3كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards efficient generative large language model serving: A survey from algorithms to systems‏

X Miao, G Oliaro, Z Zhang, X Cheng, H **… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …‏

حفظ اقتباس تم اقتباسها في عدد: 73 مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

اقتباس

بحث متقدم

تم حفظ المقالة في مكتبتي.

A survey of large language models‏

Understanding llms: A comprehensive overview from training to inference‏

Powerinfer: Fast large language model serving with a consumer-grade gpu‏

Efficient large language models: A survey‏

Unlocking efficiency in large language model inference: A comprehensive survey of speculative decoding‏

Fairness in serving large language models‏

[PDF][PDF] Skeleton-of-thought: Large language models can do parallel decoding‏

{dLoRA}: Dynamically orchestrating requests and adapters for {LoRA}{LLM} serving‏

Medusa: Simple llm inference acceleration framework with multiple decoding heads‏

Towards efficient generative large language model serving: A survey from algorithms to systems‏