Advancing transformer architecture in long-context large language models: A comprehensive survey
With the bomb ignited by ChatGPT, Transformer-based Large Language Models (LLMs)
have paved a revolutionary path toward Artificial General Intelligence (AGI) and have been …
have paved a revolutionary path toward Artificial General Intelligence (AGI) and have been …
The impact of positional encoding on length generalization in transformers
Length generalization, the ability to generalize from small training context sizes to larger
ones, is a critical challenge in the development of Transformer-based language models …
ones, is a critical challenge in the development of Transformer-based language models …
Lm-infinite: Simple on-the-fly length generalization for large language models
In recent years, there have been remarkable advancements in the performance of
Transformer-based Large Language Models (LLMs) across various domains. As these LLMs …
Transformer-based Large Language Models (LLMs) across various domains. As these LLMs …
The What, Why, and How of Context Length Extension Techniques in Large Language Models--A Detailed Survey
The advent of Large Language Models (LLMs) represents a notable breakthrough in Natural
Language Processing (NLP), contributing to substantial progress in both text …
Language Processing (NLP), contributing to substantial progress in both text …
Length generalization in arithmetic transformers
We examine how transformers cope with two challenges: learning basic integer arithmetic,
and generalizing to longer sequences than seen during training. We find that relative …
and generalizing to longer sequences than seen during training. We find that relative …
Learning to reason and memorize with self-notes
Large language models have been shown to struggle with multi-step reasoning, and do not
retain previous reasoning steps for future use. We propose a simple method for solving both …
retain previous reasoning steps for future use. We propose a simple method for solving both …
Kerple: Kernelized relative positional embedding for length extrapolation
Relative positional embeddings (RPE) have received considerable attention since RPEs
effectively model the relative distance among tokens and enable length extrapolation. We …
effectively model the relative distance among tokens and enable length extrapolation. We …
Positional description matters for transformers arithmetic
Transformers, central to the successes in modern Natural Language Processing, often falter
on arithmetic tasks despite their vast capabilities--which paradoxically include remarkable …
on arithmetic tasks despite their vast capabilities--which paradoxically include remarkable …
AdaMCT: adaptive mixture of CNN-transformer for sequential recommendation
Sequential recommendation (SR) aims to model users' dynamic preferences from a series of
interactions. A pivotal challenge in user modeling for SR lies in the inherent variability of …
interactions. A pivotal challenge in user modeling for SR lies in the inherent variability of …
SMLP4Rec: An Efficient all-MLP Architecture for Sequential Recommendations
Self-attention models have achieved the state-of-the-art performance in sequential
recommender systems by capturing the sequential dependencies among user–item …
recommender systems by capturing the sequential dependencies among user–item …