Advancing transformer architecture in long-context large language models: A comprehensive survey

Y Huang, J Xu, J Lai, Z Jiang, T Chen, Z Li… - arxiv preprint arxiv …, 2023 - arxiv.org
With the bomb ignited by ChatGPT, Transformer-based Large Language Models (LLMs)
have paved a revolutionary path toward Artificial General Intelligence (AGI) and have been …

Socialized learning: A survey of the paradigm shift for edge intelligence in networked systems

X Wang, Y Zhao, C Qiu, Q Hu… - … Surveys & Tutorials, 2024 - ieeexplore.ieee.org
Amidst the robust impetus from artificial intelligence (AI) and big data, edge intelligence (EI)
has emerged as a nascent computing paradigm, synthesizing AI with edge computing (EC) …

Efficientvit: Memory efficient vision transformer with cascaded group attention

X Liu, H Peng, N Zheng, Y Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Vision transformers have shown great success due to their high model capabilities.
However, their remarkable performance is accompanied by heavy computation costs, which …

Flatten transformer: Vision transformer using focused linear attention

D Han, X Pan, Y Han, S Song… - Proceedings of the …, 2023 - openaccess.thecvf.com
The quadratic computation complexity of self-attention has been a persistent challenge
when applying Transformer models to vision tasks. Linear attention, on the other hand, offers …

Flashattention: Fast and memory-efficient exact attention with io-awareness

T Dao, D Fu, S Ermon, A Rudra… - Advances in Neural …, 2022 - proceedings.neurips.cc
Transformers are slow and memory-hungry on long sequences, since the time and memory
complexity of self-attention are quadratic in sequence length. Approximate attention …

Extending context window of large language models via positional interpolation

S Chen, S Wong, L Chen, Y Tian - arxiv preprint arxiv:2306.15595, 2023 - arxiv.org
We present Position Interpolation (PI) that extends the context window sizes of RoPE-based
pretrained LLMs such as LLaMA models to up to 32768 with minimal fine-tuning (within …

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arxiv preprint arxiv …, 2021 - arxiv.org
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Transformer quality in linear time

W Hua, Z Dai, H Liu, Q Le - International conference on …, 2022 - proceedings.mlr.press
We revisit the design choices in Transformers, and propose methods to address their
weaknesses in handling long sequences. First, we propose a simple layer named gated …

Ai-generated content (aigc): A survey

J Wu, W Gan, Z Chen, S Wan, H Lin - arxiv preprint arxiv:2304.06632, 2023 - arxiv.org
To address the challenges of digital intelligence in the digital economy, artificial intelligence-
generated content (AIGC) has emerged. AIGC uses artificial intelligence to assist or replace …

Memorizing transformers

Y Wu, MN Rabe, DL Hutchins, C Szegedy - arxiv preprint arxiv …, 2022 - arxiv.org
Language models typically need to be trained or finetuned in order to acquire new
knowledge, which involves updating their weights. We instead envision language models …