Challenges and applications of large language models

J Kaddour, J Harris, M Mozes, H Bradley… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

[PDF][PDF] Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects

MU Hadi, R Qureshi, A Shah, M Irfan, A Zafar… - Authorea …, 2023‏ - researchgate.net
Within the vast expanse of computerized language processing, a revolutionary entity known
as Large Language Models (LLMs) has emerged, wielding immense power in its capacity to …

Efficient large language models: A survey

Z Wan, X Wang, C Liu, S Alam, Y Zheng, J Liu… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Large Language Models (LLMs) have demonstrated remarkable capabilities in important
tasks such as natural language understanding and language generation, and thus have the …

Towards efficient generative large language model serving: A survey from algorithms to systems

X Miao, G Oliaro, Z Zhang, X Cheng, H **… - arxiv preprint arxiv …, 2023‏ - arxiv.org
In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …

A survey on efficient inference for large language models

Z Zhou, X Ning, K Hong, T Fu, J Xu, S Li, Y Lou… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Large Language Models (LLMs) have attracted extensive attention due to their remarkable
performance across various tasks. However, the substantial computational and memory …

Loongserve: Efficiently serving long-context large language models with elastic sequence parallelism

B Wu, S Liu, Y Zhong, P Sun, X Liu, X ** - Proceedings of the ACM …, 2024‏ - dl.acm.org
The context window of large language models (LLMs) is rapidly increasing, leading to a
huge variance in resource usage between different requests as well as between different …

Minference 1.0: Accelerating pre-filling for long-context llms via dynamic sparse attention

H Jiang, Y Li, C Zhang, Q Wu, X Luo, S Ahn… - arxiv preprint arxiv …, 2024‏ - arxiv.org
The computational challenges of Large Language Model (LLM) inference remain a
significant barrier to their widespread deployment, especially as prompt lengths continue to …

Advancing transformer architecture in long-context large language models: A comprehensive survey

Y Huang, J Xu, J Lai, Z Jiang, T Chen, Z Li… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Transformer-based Large Language Models (LLMs) have been applied in diverse areas
such as knowledge bases, human interfaces, and dynamic agents, and marking a stride …

Zipvl: Efficient large vision-language models with dynamic token sparsification and kv cache compression

Y He, F Chen, J Liu, W Shao, H Zhou, K Zhang… - arxiv preprint arxiv …, 2024‏ - arxiv.org
The efficiency of large vision-language models (LVLMs) is constrained by the computational
bottleneck of the attention mechanism during the prefill phase and the memory bottleneck of …

Rl4co: an extensive reinforcement learning for combinatorial optimization benchmark

F Berto, C Hua, J Park, L Luttmann, Y Ma, F Bu… - arxiv preprint arxiv …, 2023‏ - arxiv.org
We introduce RL4CO, an extensive reinforcement learning (RL) for combinatorial
optimization (CO) benchmark. RL4CO employs state-of-the-art software libraries as well as …