Deja vu: Contextual sparsity for efficient llms at inference time

Z Liu, J Wang, T Dao, T Zhou, B Yuan… - International …, 2023 - proceedings.mlr.press
Large language models (LLMs) with hundreds of billions of parameters have sparked a new
wave of exciting AI applications. However, they are computationally expensive at inference …

Federated empirical risk minimization via second-order method

S Bian, Z Song, J Yin - arxiv preprint arxiv:2305.17482, 2023 - arxiv.org
Many convex optimization problems with important applications in machine learning are
formulated as empirical risk minimization (ERM). There are several examples: linear and …

On Computational Limits of FlowAR Models: Expressivity and Efficiency

C Gong, Y Ke, X Li, Y Liang, Z Sha, Z Shi… - arxiv preprint arxiv …, 2025 - arxiv.org
The expressive power and computational complexity of deep visual generative models, such
as flow-based and autoregressive (AR) models, have gained considerable interest for their …