Challenges and applications of large language models

J Kaddour, J Harris, M Mozes, H Bradley… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

[PDF][PDF] Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects

MU Hadi, R Qureshi, A Shah, M Irfan, A Zafar… - Authorea …, 2023 - researchgate.net
Within the vast expanse of computerized language processing, a revolutionary entity known
as Large Language Models (LLMs) has emerged, wielding immense power in its capacity to …

[PDF][PDF] A survey of large language models

WX Zhao, K Zhou, J Li, T Tang… - arxiv preprint arxiv …, 2023 - paper-notes.zhjwpku.com
Ever since the Turing Test was proposed in the 1950s, humans have explored the mastering
of language intelligence by machine. Language is essentially a complex, intricate system of …

[PDF][PDF] Mamba: Linear-time sequence modeling with selective state spaces

A Gu, T Dao - arxiv preprint arxiv:2312.00752, 2023 - minjiazhang.github.io
Foundation models, now powering most of the exciting applications in deep learning, are
almost universally based on the Transformer architecture and its core attention module …

Vmamba: Visual state space model

Y Liu, Y Tian, Y Zhao, H Yu, L **e… - Advances in neural …, 2025 - proceedings.neurips.cc
Designing computationally efficient network architectures remains an ongoing necessity in
computer vision. In this paper, we adapt Mamba, a state-space language model, into …

Lost in the middle: How language models use long contexts

NF Liu, K Lin, J Hewitt, A Paranjape… - Transactions of the …, 2024 - direct.mit.edu
While recent language models have the ability to take long contexts as input, relatively little
is known about how well they use longer context. We analyze the performance of language …

The falcon series of open language models

E Almazrouei, H Alobeidli, A Alshamsi… - arxiv preprint arxiv …, 2023 - arxiv.org
We introduce the Falcon series: 7B, 40B, and 180B parameters causal decoder-only models
trained on a diverse high-quality corpora predominantly assembled from web data. The …

Transformers are ssms: Generalized models and efficient algorithms through structured state space duality

T Dao, A Gu - arxiv preprint arxiv:2405.21060, 2024 - arxiv.org
While Transformers have been the main architecture behind deep learning's success in
language modeling, state-space models (SSMs) such as Mamba have recently been shown …

Rwkv: Reinventing rnns for the transformer era

B Peng, E Alcaide, Q Anthony, A Albalak… - arxiv preprint arxiv …, 2023 - arxiv.org
Transformers have revolutionized almost all natural language processing (NLP) tasks but
suffer from memory and computational complexity that scales quadratically with sequence …

xlstm: Extended long short-term memory

M Beck, K Pöppel, M Spanring, A Auer… - Advances in …, 2025 - proceedings.neurips.cc
In the 1990s, the constant error carousel and gating were introduced as the central ideas of
the Long Short-Term Memory (LSTM). Since then, LSTMs have stood the test of time and …