- Academic Search

Articles

Scholar

2 results (0.02 sec)

My profile My library

Seed: Accelerating reasoning tree construction via scheduled speculative decoding

Search within citing articles

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

SCOPE: Optimizing Key-Value Cache Compression in Long-context Generation

J Wu, Z Wang, L Zhang, Y Lai, Y He, D Zhou - arxiv preprint arxiv …, 2024 - arxiv.org

Key-Value (KV) cache has become a bottleneck of LLMs for long-context generation.
Despite the numerous efforts in this area, the optimization for the decoding phase is …

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

[PDF][PDF] Make Every Token Count: A Systematic Survey on Decoding Methods for Foundation Models

H Wang, K Shu - researchgate.net

Foundation models, such as large language models (LLMs) and large vision-language
models (LVLMs), have gained significant attention for their remarkable performance across …

Save Cite Related articles View as HTML

Create alert

Cite

Advanced search

Saved to My library

Seed: Accelerating reasoning tree construction via scheduled speculative decoding

SCOPE: Optimizing Key-Value Cache Compression in Long-context Generation

[PDF][PDF] Make Every Token Count: A Systematic Survey on Decoding Methods for Foundation Models