A survey on efficient inference for large language models

Z Zhou, X Ning, K Hong, T Fu, J Xu, S Li, Y Lou… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) have attracted extensive attention due to their remarkable
performance across various tasks. However, the substantial computational and memory …

Compute or load kv cache? why not both?

S **, X Liu, Q Zhang, ZM Mao - arxiv preprint arxiv:2410.03065, 2024 - arxiv.org
Recent advancements in Large Language Models (LLMs) have significantly increased
context window sizes, enabling sophisticated applications but also introducing substantial …

An LLM-Tool Compiler for Fused Parallel Function Calling

S Singh, A Karatzas, M Fore, I Anagnostopoulos… - arxiv preprint arxiv …, 2024 - arxiv.org
State-of-the-art sequential reasoning in Large Language Models (LLMs) has expanded the
capabilities of Copilots beyond conversational tasks to complex function calling, managing …

Eagle: Efficient training-free router for multi-llm inference

Z Zhao, S **, ZM Mao - arxiv preprint arxiv:2409.15518, 2024 - arxiv.org
The proliferation of Large Language Models (LLMs) with varying capabilities and costs has
created a need for efficient model selection in AI systems. LLM routers address this need by …

Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution

Y Xu, X Kong, T Chen, D Zhuo - arxiv preprint arxiv:2406.00059, 2024 - arxiv.org
The complexity of large language model (LLM) serving workloads has substantially
increased due to the integration with external tool invocations, such as ChatGPT plugins. In …

Bridging Data and Hardware Gap for Efficient Machine Learning Model Scaling

H Zheng - 2024 - deepblue.lib.umich.edu
Recent research in deep learning models has achieved astonishing progress in various
domains, like image classification, text generation, and image generation. With the …

[PDF][PDF] Tutorial Proposal: Efficient Inference for Large Language Models–Algorithm, Model, and System

X Ning, G Dai, H Bai, L Hou, Y Wang, Q Liu - nics-effalg.com
Background. Large Language Models (LLMs) have attracted significant attention from both
academia and industry in recent years. They are revolutionizing many applications …