Google znalac

Y Tay, M Dehghani, D Bahri, D Metzler - ACM Computing Surveys, 2022 - dl.acm.org

Transformer model architectures have garnered immense interest lately due to their
effectiveness across a range of domains like language, vision, and reinforcement learning …

Spremi Citiraj Spominje se 1395 puta Srodni članci Svih 5 inačica

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

An empirical survey on long document summarization: Datasets, models, and metrics

HY Koh, J Ju, M Liu, S Pan - ACM computing surveys, 2022 - dl.acm.org

Long documents such as academic articles and business reports have been the standard
format to detail out important issues and complicated subjects that require extra attention. An …

Spremi Citiraj Spominje se 127 puta Srodni članci Svih 9 inačica

[Free GPT-4]
[DeepSeek]

[PDF] github.io

[PDF][PDF] Mamba: Linear-time sequence modeling with selective state spaces

A Gu, T Dao - arxiv preprint arxiv:2312.00752, 2023 - minjiazhang.github.io

Foundation models, now powering most of the exciting applications in deep learning, are
almost universally based on the Transformer architecture and its core attention module …

Spremi Citiraj Spominje se 2256 puta Srodni članci Svih 11 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

U-mamba: Enhancing long-range dependency for biomedical image segmentation

J Ma, F Li, B Wang - arxiv preprint arxiv:2401.04722, 2024 - arxiv.org

Convolutional Neural Networks (CNNs) and Transformers have been the most popular
architectures for biomedical image segmentation, but both of them have limited ability to …

Spremi Citiraj Spominje se 379 puta Srodni članci Svih 3 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Rwkv: Reinventing rnns for the transformer era

B Peng, E Alcaide, Q Anthony, A Albalak… - arxiv preprint arxiv …, 2023 - arxiv.org

Transformers have revolutionized almost all natural language processing (NLP) tasks but
suffer from memory and computational complexity that scales quadratically with sequence …

Spremi Citiraj Spominje se 473 puta Srodni članci Svih 9 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution

E Nguyen, M Poli, M Faizi, A Thomas… - Advances in neural …, 2023 - proceedings.neurips.cc

Genomic (DNA) sequences encode an enormous amount of information for gene regulation
and protein synthesis. Similar to natural language models, researchers have proposed …

Spremi Citiraj Spominje se 263 puta Srodni članci Svih 11 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Resurrecting recurrent neural networks for long sequences

A Orvieto, SL Smith, A Gu, A Fernando… - International …, 2023 - proceedings.mlr.press

Abstract Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are
hard to optimize and slow to train. Deep state-space models (SSMs) have recently been …

Spremi Citiraj Spominje se 264 puta Srodni članci Svih 9 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Hungry hungry hippos: Towards language modeling with state space models

DY Fu, T Dao, KK Saab, AW Thomas, A Rudra… - arxiv preprint arxiv …, 2022 - arxiv.org

State space models (SSMs) have demonstrated state-of-the-art sequence modeling
performance in some modalities, but underperform attention in language modeling …

Spremi Citiraj Spominje se 460 puta Srodni članci Svih 4 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Simplified state space layers for sequence modeling

JTH Smith, A Warrington, SW Linderman - arxiv preprint arxiv:2208.04933, 2022 - arxiv.org

Models using structured state space sequence (S4) layers have achieved state-of-the-art
performance on long-range sequence modeling tasks. An S4 layer combines linear state …

Spremi Citiraj Spominje se 491 puta Srodni članci Svih 5 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Do the rewards justify the means? measuring trade-offs between rewards and ethical behavior in the machiavelli benchmark

A Pan, JS Chan, A Zou, N Li, S Basart… - International …, 2023 - proceedings.mlr.press

Artificial agents have traditionally been trained to maximize reward, which may incentivize
power-seeking and deception, analogous to how next-token prediction in language models …

Spremi Citiraj Spominje se 130 puta Srodni članci Svih 8 inačica Prikaži kao HTML

Stvori obavijest

Citiraj

Napredno pretraživanje

Spremljeno u Moju knjižnicu

Long range arena: A benchmark for efficient transformers

Efficient transformers: A survey

An empirical survey on long document summarization: Datasets, models, and metrics

[PDF][PDF] Mamba: Linear-time sequence modeling with selective state spaces

U-mamba: Enhancing long-range dependency for biomedical image segmentation

Rwkv: Reinventing rnns for the transformer era

Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution

Resurrecting recurrent neural networks for long sequences

Hungry hungry hippos: Towards language modeling with state space models

Simplified state space layers for sequence modeling

Do the rewards justify the means? measuring trade-offs between rewards and ethical behavior in the machiavelli benchmark