Google Tudós

H Jiang, Y Li, C Zhang, Q Wu, X Luo, S Ahn… - arxiv preprint arxiv …, 2024 - arxiv.org

The computational challenges of Large Language Model (LLM) inference remain a
significant barrier to their widespread deployment, especially as prompt lengths continue to …

Mentés Hivatkozás Idézetek száma: 37 Kapcsolódó cikkek Mind a(z) 4 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Eagle-2: Faster inference of language models with dynamic draft trees

Y Li, F Wei, C Zhang, H Zhang - arxiv preprint arxiv:2406.16858, 2024 - arxiv.org

Inference with modern Large Language Models (LLMs) is expensive and time-consuming,
and speculative sampling has proven to be an effective solution. Most speculative sampling …

Mentés Hivatkozás Idézetek száma: 23 Kapcsolódó cikkek Mind a(z) 3 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Multi-layer transformers gradient can be approximated in almost linear time

Y Liang, Z Sha, Z Shi, Z Song, Y Zhou - arxiv preprint arxiv:2408.13233, 2024 - arxiv.org

The computational complexity of the self-attention mechanism in popular transformer
architectures poses significant challenges for training and inference, and becomes the …

Mentés Hivatkozás Idézetek száma: 25 Kapcsolódó cikkek Mind a(z) 5 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A tighter complexity analysis of sparsegpt

X Li, Y Liang, Z Shi, Z Song - arxiv preprint arxiv:2408.12151, 2024 - arxiv.org

In this work, we improved the analysis of the running time of SparseGPT [Frantar, Alistarh
ICML 2023] from $ O (d^{3}) $ to $ O (d^{\omega}+ d^{2+ a+ o (1)}+ d^{1+\omega (1, 1, a)-a}) …

Mentés Hivatkozás Idézetek száma: 20 Kapcsolódó cikkek Mind a(z) 5 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Magicpig: Lsh sampling for efficient llm generation

Z Chen, R Sadhukhan, Z Ye, Y Zhou, J Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models (LLMs) with long context windows have gained significant attention.
However, the KV cache, stored to avoid re-computation, becomes a bottleneck. Various …

Mentés Hivatkozás Idézetek száma: 6 Kapcsolódó cikkek Mind a(z) 4 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Shadowkv: Kv cache in shadows for high-throughput long-context llm inference

H Sun, LW Chang, W Bao, S Zheng, N Zheng… - arxiv preprint arxiv …, 2024 - arxiv.org

With the widespread deployment of long-context large language models (LLMs), there has
been a growing demand for efficient support of high-throughput inference. However, as the …

Mentés Hivatkozás Idézetek száma: 4 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Recycled Attention: Efficient inference for long-context language models

F Xu, T Goyal, E Choi - arxiv preprint arxiv:2411.05787, 2024 - arxiv.org

Generating long sequences of tokens given a long-context input imposes a heavy
computational burden for large language models (LLMs). One of the computational …

Mentés Hivatkozás Idézetek száma: 2 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A theoretical perspective for speculative decoding algorithm

M Yin, M Chen, K Huang, M Wang - arxiv preprint arxiv:2411.00841, 2024 - arxiv.org

Transformer-based autoregressive sampling has been the major bottleneck for slowing
down large language model inferences. One effective way to accelerate inference is\emph …

Mentés Hivatkozás Idézetek száma: 2 Kapcsolódó cikkek Mind a(z) 4 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Scbench: A kv cache-centric analysis of long-context methods

Y Li, H Jiang, Q Wu, X Luo, S Ahn, C Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Long-context LLMs have enabled numerous downstream applications but also introduced
significant challenges related to computational and memory efficiency. To address these …

Mentés Hivatkozás Idézetek száma: 2 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Seed: Accelerating reasoning tree construction via scheduled speculative decoding

Z Wang, J Wu, Y Lai, C Zhang, D Zhou - arxiv preprint arxiv:2406.18200, 2024 - arxiv.org

Large Language Models (LLMs) demonstrate remarkable emergent abilities across various
tasks, yet fall short of complex reasoning and planning tasks. The tree-search-based …

Mentés Hivatkozás Idézetek száma: 2 Kapcsolódó cikkek Mind a(z) 5 változat HTML-változat

Értesítés létrehozása

Hivatkozás

Speciális keresés

Mentve a Saját könyvtárba

Triforce: Lossless acceleration of long sequence generation with hierarchical speculative decoding

Minference 1.0: Accelerating pre-filling for long-context llms via dynamic sparse attention

Eagle-2: Faster inference of language models with dynamic draft trees

Multi-layer transformers gradient can be approximated in almost linear time

A tighter complexity analysis of sparsegpt

Magicpig: Lsh sampling for efficient llm generation

Shadowkv: Kv cache in shadows for high-throughput long-context llm inference

Recycled Attention: Efficient inference for long-context language models

A theoretical perspective for speculative decoding algorithm

Scbench: A kv cache-centric analysis of long-context methods

Seed: Accelerating reasoning tree construction via scheduled speculative decoding