- Academic Search

J Kaddour, J Harris, M Mozes, H Bradley… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

Speichern Zitieren Zitiert von: 480 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[HTML] sciencedirect.com

[HTML][HTML] A survey of transformers

T Lin, Y Wang, X Liu, X Qiu - AI open, 2022 - Elsevier

Transformers have achieved great success in many artificial intelligence fields, such as
natural language processing, computer vision, and audio processing. Therefore, it is natural …

Speichern Zitieren Zitiert von: 1456 Ähnliche Artikel Alle 4 Versionen

[Free GPT-4]

[PDF] acm.org

Harnessing the power of llms in practice: A survey on chatgpt and beyond

J Yang, H **, R Tang, X Han, Q Feng, H Jiang… - ACM Transactions on …, 2024 - dl.acm.org

This article presents a comprehensive and practical guide for practitioners and end-users
working with Large Language Models (LLMs) in their downstream Natural Language …

Speichern Zitieren Zitiert von: 761 Ähnliche Artikel Alle 6 Versionen

[Free GPT-4]

[PDF] arxiv.org

Rwkv: Reinventing rnns for the transformer era

B Peng, E Alcaide, Q Anthony, A Albalak… - arxiv preprint arxiv …, 2023 - arxiv.org

Transformers have revolutionized almost all natural language processing (NLP) tasks but
suffer from memory and computational complexity that scales quadratically with sequence …

Speichern Zitieren Zitiert von: 435 Ähnliche Artikel Alle 9 Versionen HTML-Version

[Free GPT-4]

[PDF] thecvf.com

Flatten transformer: Vision transformer using focused linear attention

D Han, X Pan, Y Han, S Song… - Proceedings of the …, 2023 - openaccess.thecvf.com

The quadratic computation complexity of self-attention has been a persistent challenge
when applying Transformer models to vision tasks. Linear attention, on the other hand, offers …

Speichern Zitieren Zitiert von: 179 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]

[PDF] neurips.cc

Flashattention: Fast and memory-efficient exact attention with io-awareness

T Dao, D Fu, S Ermon, A Rudra… - Advances in Neural …, 2022 - proceedings.neurips.cc

Transformers are slow and memory-hungry on long sequences, since the time and memory
complexity of self-attention are quadratic in sequence length. Approximate attention …

Speichern Zitieren Zitiert von: 1720 Ähnliche Artikel Alle 10 Versionen HTML-Version

[Free GPT-4]

[PDF] mlr.press

Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting

T Zhou, Z Ma, Q Wen, X Wang… - … on machine learning, 2022 - proceedings.mlr.press

Long-term time series forecasting is challenging since prediction accuracy tends to
decrease dramatically with the increasing horizon. Although Transformer-based methods …

Speichern Zitieren Zitiert von: 1618 Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]

[PDF] neurips.cc

Pure transformers are powerful graph learners

J Kim, D Nguyen, S Min, S Cho… - Advances in Neural …, 2022 - proceedings.neurips.cc

We show that standard Transformers without graph-specific modifications can lead to
promising results in graph learning both in theory and practice. Given a graph, we simply …

Speichern Zitieren Zitiert von: 186 Ähnliche Artikel Alle 9 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Simplified state space layers for sequence modeling

JTH Smith, A Warrington, SW Linderman - arxiv preprint arxiv:2208.04933, 2022 - arxiv.org

Models using structured state space sequence (S4) layers have achieved state-of-the-art
performance on long-range sequence modeling tasks. An S4 layer combines linear state …

Speichern Zitieren Zitiert von: 466 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] openreview.net

Perceiver io: A general architecture for structured inputs & outputs

A Jaegle, S Borgeaud, JB Alayrac, C Doersch… - arxiv preprint arxiv …, 2021 - arxiv.org

A central goal of machine learning is the development of systems that can solve many
problems in as many data domains as possible. Current architectures, however, cannot be …

Speichern Zitieren Zitiert von: 625 Ähnliche Artikel Alle 4 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Luna: Linear unified nested attention

Challenges and applications of large language models

[HTML][HTML] A survey of transformers

Harnessing the power of llms in practice: A survey on chatgpt and beyond

Rwkv: Reinventing rnns for the transformer era

Flatten transformer: Vision transformer using focused linear attention

Flashattention: Fast and memory-efficient exact attention with io-awareness

Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting

Pure transformers are powerful graph learners

Simplified state space layers for sequence modeling

Perceiver io: A general architecture for structured inputs & outputs