SCOPE: Optimizing Key-Value Cache Compression in Long-context Generation
Key-Value (KV) cache has become a bottleneck of LLMs for long-context generation.
Despite the numerous efforts in this area, the optimization for the decoding phase is …
Despite the numerous efforts in this area, the optimization for the decoding phase is …
[PDF][PDF] Make Every Token Count: A Systematic Survey on Decoding Methods for Foundation Models
Foundation models, such as large language models (LLMs) and large vision-language
models (LVLMs), have gained significant attention for their remarkable performance across …
models (LVLMs), have gained significant attention for their remarkable performance across …