- Academic Search

M Yin, M Chen, K Huang… - Advances in Neural …, 2025 - proceedings.neurips.cc

Transformer-based autoregressive sampling has been the major bottleneck for slowing
down large language model inferences. One effective way to accelerate inference is …

Zapisz Cytuj Cytowane przez 2 Powiązane artykuły Wszystkie wersje 5 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative Decoding

W Zhao, Y Huang, X Han, W Xu, C **ao… - arxiv preprint arxiv …, 2024 - arxiv.org

Speculative decoding is a widely used method that accelerates the generation process of
large language models (LLMs) with no compromise in model performance. It achieves this …

Zapisz Cytuj Cytowane przez 1 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Accelerating the inference of string generation-based chemical reaction models for industrial applications

M Andronov, N Andronova, M Wand… - arxiv preprint arxiv …, 2024 - arxiv.org

Template-free SMILES-to-SMILES translation models for reaction prediction and single-step
retrosynthesis are of interest for industrial applications in computer-aided synthesis planning …

Zapisz Cytuj Cytowane przez 1 Powiązane artykuły Wszystkie wersje 4 Wersja HTML

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

BASS: Batched attention-optimized speculative sampling

A theoretical perspective for speculative decoding algorithm

Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative Decoding

Accelerating the inference of string generation-based chemical reaction models for industrial applications