Google Academic

J Kaddour, J Harris, M Mozes, H Bradley… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

Salvați Citați Citat de 499 ori Articole cu conținut similar Toate cele 4 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

[PDF][PDF] Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects

MU Hadi, R Qureshi, A Shah, M Irfan, A Zafar… - Authorea …, 2023 - researchgate.net

Within the vast expanse of computerized language processing, a revolutionary entity known
as Large Language Models (LLMs) has emerged, wielding immense power in its capacity to …

Salvați Citați Citat de 290 ori Articole cu conținut similar Toate cele 4 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] zhjwpku.com

[PDF][PDF] A survey of large language models

WX Zhao, K Zhou, J Li, T Tang… - arxiv preprint arxiv …, 2023 - paper-notes.zhjwpku.com

Ever since the Turing Test was proposed in the 1950s, humans have explored the mastering
of language intelligence by machine. Language is essentially a complex, intricate system of …

Salvați Citați Citat de 3843 ori Articole cu conținut similar Toate cele 6 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] github.io

[PDF][PDF] Mamba: Linear-time sequence modeling with selective state spaces

A Gu, T Dao - arxiv preprint arxiv:2312.00752, 2023 - minjiazhang.github.io

Foundation models, now powering most of the exciting applications in deep learning, are
almost universally based on the Transformer architecture and its core attention module …

Salvați Citați Citat de 2385 ori Articole cu conținut similar Toate cele 11 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Vmamba: Visual state space model

Y Liu, Y Tian, Y Zhao, H Yu, L **e… - Advances in neural …, 2025 - proceedings.neurips.cc

Designing computationally efficient network architectures remains an ongoing necessity in
computer vision. In this paper, we adapt Mamba, a state-space language model, into …

Salvați Citați Citat de 1136 ori Articole cu conținut similar Toate cele 12 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

Lost in the middle: How language models use long contexts

NF Liu, K Lin, J Hewitt, A Paranjape… - Transactions of the …, 2024 - direct.mit.edu

While recent language models have the ability to take long contexts as input, relatively little
is known about how well they use longer context. We analyze the performance of language …

Salvați Citați Citat de 1264 ori Articole cu conținut similar Toate cele 14 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The falcon series of open language models

E Almazrouei, H Alobeidli, A Alshamsi… - arxiv preprint arxiv …, 2023 - arxiv.org

We introduce the Falcon series: 7B, 40B, and 180B parameters causal decoder-only models
trained on a diverse high-quality corpora predominantly assembled from web data. The …

Salvați Citați Citat de 453 ori Articole cu conținut similar Toate cele 3 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Transformers are ssms: Generalized models and efficient algorithms through structured state space duality

T Dao, A Gu - arxiv preprint arxiv:2405.21060, 2024 - arxiv.org

While Transformers have been the main architecture behind deep learning's success in
language modeling, state-space models (SSMs) such as Mamba have recently been shown …

Salvați Citați Citat de 351 ori Articole cu conținut similar Toate cele 6 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Rwkv: Reinventing rnns for the transformer era

B Peng, E Alcaide, Q Anthony, A Albalak… - arxiv preprint arxiv …, 2023 - arxiv.org

Transformers have revolutionized almost all natural language processing (NLP) tasks but
suffer from memory and computational complexity that scales quadratically with sequence …

Salvați Citați Citat de 484 ori Articole cu conținut similar Toate cele 9 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

xlstm: Extended long short-term memory

M Beck, K Pöppel, M Spanring, A Auer… - Advances in …, 2025 - proceedings.neurips.cc

In the 1990s, the constant error carousel and gating were introduced as the central ideas of
the Long Short-Term Memory (LSTM). Since then, LSTMs have stood the test of time and …

Salvați Citați Citat de 113 ori Articole cu conținut similar Toate cele 6 versiuni Afișare ca HTML

Creează alerta

Citați

Căutare avansată

Salvat în Bibliotecă

Hungry hungry hippos: Towards language modeling with state space models

Challenges and applications of large language models

[PDF][PDF] Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects

[PDF][PDF] A survey of large language models

[PDF][PDF] Mamba: Linear-time sequence modeling with selective state spaces

Vmamba: Visual state space model

Lost in the middle: How language models use long contexts

The falcon series of open language models

Transformers are ssms: Generalized models and efficient algorithms through structured state space duality

Rwkv: Reinventing rnns for the transformer era

xlstm: Extended long short-term memory