Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Eagle-2: Faster inference of language models with dynamic draft trees
Inference with modern Large Language Models (LLMs) is expensive and time-consuming,
and speculative sampling has proven to be an effective solution. Most speculative sampling …
and speculative sampling has proven to be an effective solution. Most speculative sampling …
Speculative diffusion decoding: Accelerating language generation through diffusion
Speculative decoding has emerged as a widely adopted method to accelerate large
language model inference without sacrificing the quality of the model outputs. While this …
language model inference without sacrificing the quality of the model outputs. While this …
GRIFFIN: Effective Token Alignment for Faster Speculative Decoding
Speculative decoding accelerates inference in large language models (LLMs) by generating
multiple draft tokens simultaneously. However, existing methods often struggle with token …
multiple draft tokens simultaneously. However, existing methods often struggle with token …
Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference
L Zhang, Z Zhang, B Xu, S Mei, D Li - arxiv preprint arxiv:2412.18934, 2024 - arxiv.org
Due to the high resource demands of Large Language Models (LLMs), achieving
widespread deployment on consumer-grade devices presents significant challenges …
widespread deployment on consumer-grade devices presents significant challenges …
C2T: A Classifier-Based Tree Construction Method in Speculative Decoding
F Huo, J Tan, K Zhang, X Cai, S Sun - arxiv preprint arxiv:2502.13652, 2025 - arxiv.org
The growing scale of Large Language Models (LLMs) has exacerbated inference latency
and computational costs. Speculative decoding methods, which aim to mitigate these issues …
and computational costs. Speculative decoding methods, which aim to mitigate these issues …
WeInfer: Unleashing the Power of WebGPU on LLM Inference in Web Browsers
Z Chen, Y Ma, S Haiyang, M Liu - THE WEB CONFERENCE 2025 - openreview.net
Web-based large language model (LLM) has garnered significant attention from both
academia and industry due to its potential to combine the benefits of on-device computation …
academia and industry due to its potential to combine the benefits of on-device computation …
[PDF][PDF] Speculative Diffusion Decoding for Accelerated Language Generation
Speculative decoding has emerged as a widely adopted method to accelerate large
language model inference without sacrificing the quality of the model outputs. While this …
language model inference without sacrificing the quality of the model outputs. While this …
Polybasic Speculative Decoding Under a Theoretical Perspective
Speculative decoding has emerged as a critical technique for accelerating inference in large
language models, achieving significant speedups while ensuring consistency with the …
language models, achieving significant speedups while ensuring consistency with the …
[ЦИТИРОВАНИЕ][C] Empowering Large Language Models to Edge Intelligence: A Survey of Edge Efficient LLMs and Techniques
R Wanga, Z Gaoa, L Zhanga, S Yuea, Z Gaoa