Early Exit Is a Natural Capability in Transformer-based Models: An Empirical Study on Early Exit without Joint Optimization
Large language models (LLMs) exhibit exceptional performance across various downstream
tasks. However, they encounter limitations due to slow inference speeds stemming from their …
tasks. However, they encounter limitations due to slow inference speeds stemming from their …