Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
A Review on edge large language models: Design, Execution, and Applications
Large language models (LLMs) have revolutionized natural language processing with their
exceptional understanding, synthesizing, and reasoning capabilities. However, deploying …
exceptional understanding, synthesizing, and reasoning capabilities. However, deploying …
Llm inference serving: Survey of recent advances and opportunities
This survey offers a comprehensive overview of recent advancements in Large Language
Model (LLM) serving systems, focusing on research since the year 2023. We specifically …
Model (LLM) serving systems, focusing on research since the year 2023. We specifically …
A survey on efficient inference for large language models
Large Language Models (LLMs) have attracted extensive attention due to their remarkable
performance across various tasks. However, the substantial computational and memory …
performance across various tasks. However, the substantial computational and memory …
Loongserve: Efficiently serving long-context large language models with elastic sequence parallelism
The context window of large language models (LLMs) is rapidly increasing, leading to a
huge variance in resource usage between different requests as well as between different …
huge variance in resource usage between different requests as well as between different …
Fast distributed inference serving for large language models
Large language models (LLMs) power a new generation of interactive AI applications
exemplified by ChatGPT. The interactive nature of these applications demands low latency …
exemplified by ChatGPT. The interactive nature of these applications demands low latency …
Parrot: Efficient Serving of {LLM-based} Applications with Semantic Variable
The rise of large language models (LLMs) has enabled LLM-based applications (aka AI
agents or co-pilots), a new software paradigm that combines the strength of LLM and …
agents or co-pilots), a new software paradigm that combines the strength of LLM and …
Vidur: A large-scale simulation framework for llm inference
Large language models (LLMs) are widely used in various domains for their ability to
perform tasks that requirehuman-like skills. However, LLM inference is expensive today …
perform tasks that requirehuman-like skills. However, LLM inference is expensive today …
Ragcache: Efficient knowledge caching for retrieval-augmented generation
Retrieval-Augmented Generation (RAG) has shown significant improvements in various
natural language processing tasks by integrating the strengths of large language models …
natural language processing tasks by integrating the strengths of large language models …
Mooncake: A kvcache-centric disaggregated architecture for llm serving
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI. It
features a KVCache-centric disaggregated architecture that separates the prefill and …
features a KVCache-centric disaggregated architecture that separates the prefill and …
Empowering 1000 tokens/second on-device llm prefilling with mllm-npu
On-device large language models (LLMs) are catalyzing novel mobile applications such as
UI task automation and personalized email auto-reply, without giving away users' private …
UI task automation and personalized email auto-reply, without giving away users' private …