Академия Google

Сохранить Цитировать Цитируется: 7 Похожие статьи Все версии статьи (2) В виде HTML

Speculative diffusion decoding: Accelerating language generation through diffusion

JK Christopher, BR Bartoldson, B Kailkhura… - arxiv preprint arxiv …, 2024 - arxiv.org

Speculative decoding has emerged as a widely adopted method to accelerate large
language model inference without sacrificing the quality of the model outputs. While this …

GRIFFIN: Effective Token Alignment for Faster Speculative Decoding

S Hu, J Li, X **e, Z Lu, KC Toh, P Zhou - arxiv preprint arxiv:2502.11018, 2025 - arxiv.org

Speculative decoding accelerates inference in large language models (LLMs) by generating
multiple draft tokens simultaneously. However, existing methods often struggle with token …

Сохранить Цитировать Похожие статьи Все версии статьи (2) В виде HTML

Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference

L Zhang, Z Zhang, B Xu, S Mei, D Li - arxiv preprint arxiv:2412.18934, 2024 - arxiv.org

Due to the high resource demands of Large Language Models (LLMs), achieving
widespread deployment on consumer-grade devices presents significant challenges …

C2T: A Classifier-Based Tree Construction Method in Speculative Decoding

F Huo, J Tan, K Zhang, X Cai, S Sun - arxiv preprint arxiv:2502.13652, 2025 - arxiv.org

The growing scale of Large Language Models (LLMs) has exacerbated inference latency
and computational costs. Speculative decoding methods, which aim to mitigate these issues …

[PDF] openreview.net

WeInfer: Unleashing the Power of WebGPU on LLM Inference in Web Browsers

Z Chen, Y Ma, S Haiyang, M Liu - THE WEB CONFERENCE 2025 - openreview.net

Web-based large language model (LLM) has garnered significant attention from both
academia and industry due to its potential to combine the benefits of on-device computation …

[PDF] github.io

[PDF][PDF] Speculative Diffusion Decoding for Accelerated Language Generation

JK Christopher, BR Bartoldson, T Ben-Nun, M Cardei… - neurips2024-enlsp.github.io

Speculative decoding has emerged as a widely adopted method to accelerate large
language model inference without sacrificing the quality of the model outputs. While this …

[PDF] openreview.net

Polybasic Speculative Decoding Under a Theoretical Perspective

R Wang, H Li, Y Ma, X Zheng, F Chao, X **ao, R Ji - openreview.net

Speculative decoding has emerged as a critical technique for accelerating inference in large
language models, achieving significant speedups while ensuring consistency with the …