Google Наука

Y Huang, J Xu, J Lai, Z Jiang, T Chen, Z Li… - arxiv preprint arxiv …, 2023 - arxiv.org

Transformer-based Large Language Models (LLMs) have been applied in diverse areas
such as knowledge bases, human interfaces, and dynamic agents, and marking a stride …

Запазване Позоваване С позовавания в 39 Сродни статии Всички 2 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Rwkv: Reinventing rnns for the transformer era

B Peng, E Alcaide, Q Anthony, A Albalak… - arxiv preprint arxiv …, 2023 - arxiv.org

Transformers have revolutionized almost all natural language processing (NLP) tasks but
suffer from memory and computational complexity that scales quadratically with sequence …

Запазване Позоваване С позовавания в 481 Сродни статии Всички 9 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Spatten: Efficient sparse attention architecture with cascade token and head pruning

H Wang, Z Zhang, S Han - 2021 IEEE International Symposium …, 2021 - ieeexplore.ieee.org

The attention mechanism is becoming increasingly popular in Natural Language Processing
(NLP) applications, showing superior performance than convolutional and recurrent …

Запазване Позоваване С позовавания в 408 Сродни статии Всички 6 версии

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Beyond efficiency: A systematic survey of resource-efficient large language models

G Bai, Z Chai, C Ling, S Wang, J Lu, N Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated
models like OpenAI's ChatGPT, represents a significant advancement in artificial …

Запазване Позоваване С позовавания в 80 Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Simple linear attention language models balance the recall-throughput tradeoff

S Arora, S Eyuboglu, M Zhang, A Timalsina… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent work has shown that attention-based language models excel at recall, the ability to
ground generations in tokens previously seen in context. However, the efficiency of attention …

Запазване Позоваване С позовавания в 59 Сродни статии Всички 8 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Enable deep learning on mobile devices: Methods, systems, and applications

H Cai, J Lin, Y Lin, Z Liu, H Tang, H Wang… - ACM Transactions on …, 2022 - dl.acm.org

Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial
intelligence (AI), including computer vision, natural language processing, and speech …

Запазване Позоваване С позовавания в 131 Сродни статии Всички 6 версии

[Free GPT-4]
[DeepSeek]

[PDF] github.io

ELSA: Hardware-software co-design for efficient, lightweight self-attention mechanism in neural networks

TJ Ham, Y Lee, SH Seo, S Kim, H Choi… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org

The self-attention mechanism is rapidly emerging as one of the most important key primitives
in neural networks (NNs) for its ability to identify the relations within input entities. The self …

Запазване Позоваване С позовавания в 142 Сродни статии Всички 4 версии

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Self-attention Does Not Need Memory

MN Rabe, C Staats - arxiv preprint arxiv:2112.05682, 2021 - arxiv.org

We present a very simple algorithm for attention that requires $ O (1) $ memory with respect
to sequence length and an extension to self-attention that requires $ O (\log n) $ memory …

Запазване Позоваване С позовавания в 129 Сродни статии Всички 2 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Recnmp: Accelerating personalized recommendation with near-memory processing

L Ke, U Gupta, BY Cho, D Brooks… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org

Personalized recommendation systems leverage deep learning models and account for the
majority of data center AI cycles. Their performance is dominated by memory-bound sparse …

Запазване Позоваване С позовавания в 259 Сродни статии Всички 12 версии

[Free GPT-4]
[DeepSeek]

[PDF] nsf.gov

TransPIM: A memory-based acceleration via software-hardware co-design for transformer

M Zhou, W Xu, J Kang, T Rosing - 2022 IEEE International …, 2022 - ieeexplore.ieee.org

Transformer-based models are state-of-the-art for many machine learning (ML) tasks.
Executing Transformer usually requires a long execution time due to the large memory …

Запазване Позоваване С позовавания в 98 Сродни статии Всички 3 версии

Създаване на сигнал

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

Mnnfast: A fast and scalable system architecture for memory-augmented neural networks

Advancing transformer architecture in long-context large language models: A comprehensive survey

Rwkv: Reinventing rnns for the transformer era

Spatten: Efficient sparse attention architecture with cascade token and head pruning

Beyond efficiency: A systematic survey of resource-efficient large language models

Simple linear attention language models balance the recall-throughput tradeoff

Enable deep learning on mobile devices: Methods, systems, and applications

ELSA: Hardware-software co-design for efficient, lightweight self-attention mechanism in neural networks

Self-attention Does Not Need Memory

Recnmp: Accelerating personalized recommendation with near-memory processing

TransPIM: A memory-based acceleration via software-hardware co-design for transformer