A theoretical perspective for speculative decoding algorithm

M Yin, M Chen, K Huang… - Advances in Neural …, 2025 - proceedings.neurips.cc
Transformer-based autoregressive sampling has been the major bottleneck for slowing
down large language model inferences. One effective way to accelerate inference is …

Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative Decoding

W Zhao, Y Huang, X Han, W Xu, C **ao… - arxiv preprint arxiv …, 2024 - arxiv.org
Speculative decoding is a widely used method that accelerates the generation process of
large language models (LLMs) with no compromise in model performance. It achieves this …

Accelerating the inference of string generation-based chemical reaction models for industrial applications

M Andronov, N Andronova, M Wand… - arxiv preprint arxiv …, 2024 - arxiv.org
Template-free SMILES-to-SMILES translation models for reaction prediction and single-step
retrosynthesis are of interest for industrial applications in computer-aided synthesis planning …