AdaEAGLE: Optimizing Speculative Decoding via Explicit Modeling of Adaptive Draft Structures
Speculative Decoding (SD) is a popular lossless technique for accelerating the inference of
Large Language Models (LLMs). We show that the decoding speed of SD frameworks with …
Large Language Models (LLMs). We show that the decoding speed of SD frameworks with …