AdaEAGLE: Optimizing Speculative Decoding via Explicit Modeling of Adaptive Draft Structures

S Zhang, H Wang, D Ma, Z Zhu, L Chen, K Lan… - arxiv preprint arxiv …, 2024 - arxiv.org
Speculative Decoding (SD) is a popular lossless technique for accelerating the inference of
Large Language Models (LLMs). We show that the decoding speed of SD frameworks with …