Mobile edge intelligence for large language models: A contemporary survey
On-device large language models (LLMs), referring to running LLMs on edge devices, have
raised considerable interest since they are more cost-effective, latency-efficient, and privacy …
raised considerable interest since they are more cost-effective, latency-efficient, and privacy …
Tool learning with large language models: A survey
Recently, tool learning with large language models (LLMs) has emerged as a promising
paradigm for augmenting the capabilities of LLMs to tackle highly complex problems …
paradigm for augmenting the capabilities of LLMs to tackle highly complex problems …
SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification
This paper introduces SpecInfer, a system that accelerates generative large language model
(LLM) serving with tree-based speculative inference and verification. The key idea behind …
(LLM) serving with tree-based speculative inference and verification. The key idea behind …
Spotserve: Serving generative large language models on preemptible instances
The high computational and memory requirements of generative large language models
(LLMs) make it challenging to serve them cheaply. This paper aims to reduce the monetary …
(LLMs) make it challenging to serve them cheaply. This paper aims to reduce the monetary …
Efficient and green large language models for software engineering: Vision and the road ahead
Large Language Models (LLMs) have recently shown remarkable capabilities in various
software engineering tasks, spurring the rapid growth of the Large Language Models for …
software engineering tasks, spurring the rapid growth of the Large Language Models for …
Break the sequential dependency of llm inference using lookahead decoding
Autoregressive decoding of large language models (LLMs) is memory bandwidth bounded,
resulting in high latency and significant wastes of the parallel processing power of modern …
resulting in high latency and significant wastes of the parallel processing power of modern …
From decoding to meta-generation: Inference-time algorithms for large language models
One of the most striking findings in modern research on large language models (LLMs) is
that scaling up compute during training leads to better results. However, less attention has …
that scaling up compute during training leads to better results. However, less attention has …
Large language models and games: A survey and roadmap
Recent years have seen an explosive increase in research on large language models
(LLMs), and accompanying public engagement on the topic. While starting as a niche area …
(LLMs), and accompanying public engagement on the topic. While starting as a niche area …
Llm inference serving: Survey of recent advances and opportunities
This survey offers a comprehensive overview of recent advancements in Large Language
Model (LLM) serving systems, focusing on research since the year 2023. We specifically …
Model (LLM) serving systems, focusing on research since the year 2023. We specifically …
Specinfer: Accelerating large language model serving with tree-based speculative inference and verification
This paper introduces SpecInfer, a system that accelerates generative large language model
(LLM) serving with tree-based speculative inference and verification. The key idea behind …
(LLM) serving with tree-based speculative inference and verification. The key idea behind …