- Academic Search

Y Huang, J Xu, J Lai, Z Jiang, T Chen, Z Li… - arxiv preprint arxiv …, 2023 - arxiv.org

With the bomb ignited by ChatGPT, Transformer-based Large Language Models (LLMs)
have paved a revolutionary path toward Artificial General Intelligence (AGI) and have been …

Speichern Zitieren Zitiert von: 37 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Socialized learning: A survey of the paradigm shift for edge intelligence in networked systems

X Wang, Y Zhao, C Qiu, Q Hu… - … Surveys & Tutorials, 2024 - ieeexplore.ieee.org

Amidst the robust impetus from artificial intelligence (AI) and big data, edge intelligence (EI)
has emerged as a nascent computing paradigm, synthesizing AI with edge computing (EC) …

Speichern Zitieren Zitiert von: 7 Ähnliche Artikel Alle 2 Versionen

[Free GPT-4]

[PDF] thecvf.com

Efficientvit: Memory efficient vision transformer with cascaded group attention

X Liu, H Peng, N Zheng, Y Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Vision transformers have shown great success due to their high model capabilities.
However, their remarkable performance is accompanied by heavy computation costs, which …

Speichern Zitieren Zitiert von: 349 Ähnliche Artikel Alle 8 Versionen HTML-Version

[Free GPT-4]

[PDF] thecvf.com

Flatten transformer: Vision transformer using focused linear attention

D Han, X Pan, Y Han, S Song… - Proceedings of the …, 2023 - openaccess.thecvf.com

The quadratic computation complexity of self-attention has been a persistent challenge
when applying Transformer models to vision tasks. Linear attention, on the other hand, offers …

Speichern Zitieren Zitiert von: 181 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]

[PDF] neurips.cc

Flashattention: Fast and memory-efficient exact attention with io-awareness

T Dao, D Fu, S Ermon, A Rudra… - Advances in Neural …, 2022 - proceedings.neurips.cc

Transformers are slow and memory-hungry on long sequences, since the time and memory
complexity of self-attention are quadratic in sequence length. Approximate attention …

Speichern Zitieren Zitiert von: 1730 Ähnliche Artikel Alle 10 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Extending context window of large language models via positional interpolation

S Chen, S Wong, L Chen, Y Tian - arxiv preprint arxiv:2306.15595, 2023 - arxiv.org

We present Position Interpolation (PI) that extends the context window sizes of RoPE-based
pretrained LLMs such as LLaMA models to up to 32768 with minimal fine-tuning (within …

Speichern Zitieren Zitiert von: 357 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arxiv preprint arxiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Speichern Zitieren Zitiert von: 4702 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] mlr.press

Transformer quality in linear time

W Hua, Z Dai, H Liu, Q Le - International conference on …, 2022 - proceedings.mlr.press

We revisit the design choices in Transformers, and propose methods to address their
weaknesses in handling long sequences. First, we propose a simple layer named gated …

Speichern Zitieren Zitiert von: 272 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Ai-generated content (aigc): A survey

J Wu, W Gan, Z Chen, S Wan, H Lin - arxiv preprint arxiv:2304.06632, 2023 - arxiv.org

To address the challenges of digital intelligence in the digital economy, artificial intelligence-
generated content (AIGC) has emerged. AIGC uses artificial intelligence to assist or replace …

Speichern Zitieren Zitiert von: 173 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Memorizing transformers

Y Wu, MN Rabe, DL Hutchins, C Szegedy - arxiv preprint arxiv …, 2022 - arxiv.org

Language models typically need to be trained or finetuned in order to acquire new
knowledge, which involves updating their weights. We instead envision language models …

Speichern Zitieren Zitiert von: 265 Ähnliche Artikel Alle 5 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Combiner: Full attention transformer with sparse computation cost

Advancing transformer architecture in long-context large language models: A comprehensive survey

Socialized learning: A survey of the paradigm shift for edge intelligence in networked systems

Efficientvit: Memory efficient vision transformer with cascaded group attention

Flatten transformer: Vision transformer using focused linear attention

Flashattention: Fast and memory-efficient exact attention with io-awareness

Extending context window of large language models via positional interpolation

On the opportunities and risks of foundation models

Transformer quality in linear time

Ai-generated content (aigc): A survey

Memorizing transformers