Early Exit Is a Natural Capability in Transformer-based Models: An Empirical Study on Early Exit without Joint Optimization

W Shan, L Meng, T Zheng, Y Luo, B Li, T **ao… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) exhibit exceptional performance across various downstream
tasks. However, they encounter limitations due to slow inference speeds stemming from their …