Efficient transformers: A survey

Y Tay, M Dehghani, D Bahri, D Metzler - ACM Computing Surveys, 2022 - dl.acm.org
Transformer model architectures have garnered immense interest lately due to their
effectiveness across a range of domains like language, vision, and reinforcement learning …

An empirical survey on long document summarization: Datasets, models, and metrics

HY Koh, J Ju, M Liu, S Pan - ACM computing surveys, 2022 - dl.acm.org
Long documents such as academic articles and business reports have been the standard
format to detail out important issues and complicated subjects that require extra attention. An …

[PDF][PDF] Mamba: Linear-time sequence modeling with selective state spaces

A Gu, T Dao - arxiv preprint arxiv:2312.00752, 2023 - minjiazhang.github.io
Foundation models, now powering most of the exciting applications in deep learning, are
almost universally based on the Transformer architecture and its core attention module …

U-mamba: Enhancing long-range dependency for biomedical image segmentation

J Ma, F Li, B Wang - arxiv preprint arxiv:2401.04722, 2024 - arxiv.org
Convolutional Neural Networks (CNNs) and Transformers have been the most popular
architectures for biomedical image segmentation, but both of them have limited ability to …

Rwkv: Reinventing rnns for the transformer era

B Peng, E Alcaide, Q Anthony, A Albalak… - arxiv preprint arxiv …, 2023 - arxiv.org
Transformers have revolutionized almost all natural language processing (NLP) tasks but
suffer from memory and computational complexity that scales quadratically with sequence …

Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution

E Nguyen, M Poli, M Faizi, A Thomas… - Advances in neural …, 2023 - proceedings.neurips.cc
Genomic (DNA) sequences encode an enormous amount of information for gene regulation
and protein synthesis. Similar to natural language models, researchers have proposed …

Resurrecting recurrent neural networks for long sequences

A Orvieto, SL Smith, A Gu, A Fernando… - International …, 2023 - proceedings.mlr.press
Abstract Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are
hard to optimize and slow to train. Deep state-space models (SSMs) have recently been …

Hungry hungry hippos: Towards language modeling with state space models

DY Fu, T Dao, KK Saab, AW Thomas, A Rudra… - arxiv preprint arxiv …, 2022 - arxiv.org
State space models (SSMs) have demonstrated state-of-the-art sequence modeling
performance in some modalities, but underperform attention in language modeling …

Simplified state space layers for sequence modeling

JTH Smith, A Warrington, SW Linderman - arxiv preprint arxiv:2208.04933, 2022 - arxiv.org
Models using structured state space sequence (S4) layers have achieved state-of-the-art
performance on long-range sequence modeling tasks. An S4 layer combines linear state …

Do the rewards justify the means? measuring trade-offs between rewards and ethical behavior in the machiavelli benchmark

A Pan, JS Chan, A Zou, N Li, S Basart… - International …, 2023 - proceedings.mlr.press
Artificial agents have traditionally been trained to maximize reward, which may incentivize
power-seeking and deception, analogous to how next-token prediction in language models …