A survey on efficient inference for large language models
Large Language Models (LLMs) have attracted extensive attention due to their remarkable
performance across various tasks. However, the substantial computational and memory …
performance across various tasks. However, the substantial computational and memory …
Compute or load kv cache? why not both?
Recent advancements in Large Language Models (LLMs) have significantly increased
context window sizes, enabling sophisticated applications but also introducing substantial …
context window sizes, enabling sophisticated applications but also introducing substantial …
An LLM-Tool Compiler for Fused Parallel Function Calling
State-of-the-art sequential reasoning in Large Language Models (LLMs) has expanded the
capabilities of Copilots beyond conversational tasks to complex function calling, managing …
capabilities of Copilots beyond conversational tasks to complex function calling, managing …
Eagle: Efficient training-free router for multi-llm inference
The proliferation of Large Language Models (LLMs) with varying capabilities and costs has
created a need for efficient model selection in AI systems. LLM routers address this need by …
created a need for efficient model selection in AI systems. LLM routers address this need by …
Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution
The complexity of large language model (LLM) serving workloads has substantially
increased due to the integration with external tool invocations, such as ChatGPT plugins. In …
increased due to the integration with external tool invocations, such as ChatGPT plugins. In …
Bridging Data and Hardware Gap for Efficient Machine Learning Model Scaling
H Zheng - 2024 - deepblue.lib.umich.edu
Recent research in deep learning models has achieved astonishing progress in various
domains, like image classification, text generation, and image generation. With the …
domains, like image classification, text generation, and image generation. With the …
[PDF][PDF] Tutorial Proposal: Efficient Inference for Large Language Models–Algorithm, Model, and System
Background. Large Language Models (LLMs) have attracted significant attention from both
academia and industry in recent years. They are revolutionizing many applications …
academia and industry in recent years. They are revolutionizing many applications …