Advancing transformer architecture in long-context large language models: A comprehensive survey
With the bomb ignited by ChatGPT, Transformer-based Large Language Models (LLMs)
have paved a revolutionary path toward Artificial General Intelligence (AGI) and have been …
have paved a revolutionary path toward Artificial General Intelligence (AGI) and have been …
Socialized learning: A survey of the paradigm shift for edge intelligence in networked systems
Amidst the robust impetus from artificial intelligence (AI) and big data, edge intelligence (EI)
has emerged as a nascent computing paradigm, synthesizing AI with edge computing (EC) …
has emerged as a nascent computing paradigm, synthesizing AI with edge computing (EC) …
Efficientvit: Memory efficient vision transformer with cascaded group attention
Vision transformers have shown great success due to their high model capabilities.
However, their remarkable performance is accompanied by heavy computation costs, which …
However, their remarkable performance is accompanied by heavy computation costs, which …
Flatten transformer: Vision transformer using focused linear attention
The quadratic computation complexity of self-attention has been a persistent challenge
when applying Transformer models to vision tasks. Linear attention, on the other hand, offers …
when applying Transformer models to vision tasks. Linear attention, on the other hand, offers …
Flashattention: Fast and memory-efficient exact attention with io-awareness
Transformers are slow and memory-hungry on long sequences, since the time and memory
complexity of self-attention are quadratic in sequence length. Approximate attention …
complexity of self-attention are quadratic in sequence length. Approximate attention …
Extending context window of large language models via positional interpolation
We present Position Interpolation (PI) that extends the context window sizes of RoPE-based
pretrained LLMs such as LLaMA models to up to 32768 with minimal fine-tuning (within …
pretrained LLMs such as LLaMA models to up to 32768 with minimal fine-tuning (within …
On the opportunities and risks of foundation models
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …
Transformer quality in linear time
We revisit the design choices in Transformers, and propose methods to address their
weaknesses in handling long sequences. First, we propose a simple layer named gated …
weaknesses in handling long sequences. First, we propose a simple layer named gated …
Ai-generated content (aigc): A survey
J Wu, W Gan, Z Chen, S Wan, H Lin - arxiv preprint arxiv:2304.06632, 2023 - arxiv.org
To address the challenges of digital intelligence in the digital economy, artificial intelligence-
generated content (AIGC) has emerged. AIGC uses artificial intelligence to assist or replace …
generated content (AIGC) has emerged. AIGC uses artificial intelligence to assist or replace …
Memorizing transformers
Language models typically need to be trained or finetuned in order to acquire new
knowledge, which involves updating their weights. We instead envision language models …
knowledge, which involves updating their weights. We instead envision language models …