- Academic Search

MC Ramos, CJ Collison, AD White - Chemical Science, 2025 - pubs.rsc.org

Large language models (LLMs) have emerged as powerful tools in chemistry, significantly
impacting molecule design, property prediction, and synthesis optimization. This review …

Enregistrer Citer Cité 21 fois Autres articles Les 4 versions Free GPT-4

A state-of-the-art review of long short-term memory models with applications in hydrology and water resources

Z Feng, J Zhang, W Niu - Applied Soft Computing, 2024 - Elsevier

Abstract Long Short-Term Memory (LSTM) has recently emerged as a crucial tool for
scientific research in hydrology and water resources. Despite its widespread use, a …

Enregistrer Citer Cité 5 fois Autres articles

[Free GPT-4]

[PDF] arxiv.org

RULER: What's the Real Context Size of Your Long-Context Language Models?

CP Hsieh, S Sun, S Kriman, S Acharya… - arxiv preprint arxiv …, 2024 - arxiv.org

The needle-in-a-haystack (NIAH) test, which examines the ability to retrieve a piece of
information (the" needle") from long distractor texts (the" haystack"), has been widely …

Enregistrer Citer Cité 101 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Learning to (learn at test time): Rnns with expressive hidden states

Y Sun, X Li, K Dalal, J Xu, A Vikram, G Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Self-attention performs well in long context but has quadratic complexity. Existing RNN
layers have linear complexity, but their performance in long context is limited by the …

Enregistrer Citer Cité 43 fois Autres articles Version HTML

[Free GPT-4]

[PDF] arxiv.org

Flashattention-3: Fast and accurate attention with asynchrony and low-precision

J Shah, G Bikshandi, Y Zhang, V Thakkar… - arxiv preprint arxiv …, 2024 - arxiv.org

Attention, as a core layer of the ubiquitous Transformer architecture, is the bottleneck for
large language models and long-context applications. FlashAttention elaborated an …

Enregistrer Citer Cité 47 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Transformers are multi-state rnns

M Oren, M Hassid, N Yarden, Y Adi… - arxiv preprint arxiv …, 2024 - arxiv.org

Transformers are considered conceptually different from the previous generation of state-of-
the-art NLP models-recurrent neural networks (RNNs). In this work, we demonstrate that …

Enregistrer Citer Cité 35 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality

T Dao, A Gu - arxiv preprint arxiv:2405.21060, 2024 - arxiv.org

While Transformers have been the main architecture behind deep learning's success in
language modeling, state-space models (SSMs) such as Mamba have recently been shown …

Enregistrer Citer Cité 282 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] preprints.org

Advanced stock price prediction with xlstm-based models: Improving long-term forecasting

X Fan, C Tao, J Zhao - 2024 11th International Conference on …, 2024 - ieeexplore.ieee.org

Stock price prediction has long been a critical area of research in financial modeling. The
inherent complexity of financial markets, characterized by both short-term fluctuations and …

Enregistrer Citer Cité 28 fois Autres articles Les 3 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

The mamba in the llama: Distilling and accelerating hybrid models

J Wang, D Paliotta, A May, AM Rush, T Dao - arxiv preprint arxiv …, 2024 - arxiv.org

Linear RNN architectures, like Mamba, can be competitive with Transformer models in
language modeling while having advantageous deployment characteristics. Given the focus …

Enregistrer Citer Cité 9 fois Autres articles Les 4 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Inference scaling for long-context retrieval augmented generation

Z Yue, H Zhuang, A Bai, K Hui, R Jagerman… - arxiv preprint arxiv …, 2024 - arxiv.org

The scaling of inference computation has unlocked the potential of long-context large
language models (LLMs) across diverse settings. For knowledge-intensive tasks, the …

Enregistrer Citer Cité 10 fois Autres articles Version HTML

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

xLSTM: Extended Long Short-Term Memory

A review of large language models and autonomous agents in chemistry

A state-of-the-art review of long short-term memory models with applications in hydrology and water resources

RULER: What's the Real Context Size of Your Long-Context Language Models?

Learning to (learn at test time): Rnns with expressive hidden states

Flashattention-3: Fast and accurate attention with asynchrony and low-precision

Transformers are multi-state rnns

Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality

Advanced stock price prediction with xlstm-based models: Improving long-term forecasting

The mamba in the llama: Distilling and accelerating hybrid models

Inference scaling for long-context retrieval augmented generation