Advancing transformer architecture in long-context large language models: A comprehensive survey

Y Huang, J Xu, J Lai, Z Jiang, T Chen, Z Li… - arxiv preprint arxiv …, 2023 - arxiv.org
With the bomb ignited by ChatGPT, Transformer-based Large Language Models (LLMs)
have paved a revolutionary path toward Artificial General Intelligence (AGI) and have been …

The impact of positional encoding on length generalization in transformers

A Kazemnejad, I Padhi… - Advances in …, 2024 - proceedings.neurips.cc
Length generalization, the ability to generalize from small training context sizes to larger
ones, is a critical challenge in the development of Transformer-based language models …

Lm-infinite: Simple on-the-fly length generalization for large language models

C Han, Q Wang, W **ong, Y Chen, H Ji… - arxiv preprint arxiv …, 2023 - arxiv.org
In recent years, there have been remarkable advancements in the performance of
Transformer-based Large Language Models (LLMs) across various domains. As these LLMs …

The What, Why, and How of Context Length Extension Techniques in Large Language Models--A Detailed Survey

S Pawar, SM Tonmoy, SM Zaman, V Jain… - arxiv preprint arxiv …, 2024 - arxiv.org
The advent of Large Language Models (LLMs) represents a notable breakthrough in Natural
Language Processing (NLP), contributing to substantial progress in both text …

Length generalization in arithmetic transformers

S Jelassi, S d'Ascoli, C Domingo-Enrich, Y Wu… - arxiv preprint arxiv …, 2023 - arxiv.org
We examine how transformers cope with two challenges: learning basic integer arithmetic,
and generalizing to longer sequences than seen during training. We find that relative …

Learning to reason and memorize with self-notes

J Lanchantin, S Toshniwal, J Weston… - Advances in Neural …, 2024 - proceedings.neurips.cc
Large language models have been shown to struggle with multi-step reasoning, and do not
retain previous reasoning steps for future use. We propose a simple method for solving both …

Kerple: Kernelized relative positional embedding for length extrapolation

TC Chi, TH Fan, PJ Ramadge… - Advances in Neural …, 2022 - proceedings.neurips.cc
Relative positional embeddings (RPE) have received considerable attention since RPEs
effectively model the relative distance among tokens and enable length extrapolation. We …

Positional description matters for transformers arithmetic

R Shen, S Bubeck, R Eldan, YT Lee, Y Li… - arxiv preprint arxiv …, 2023 - arxiv.org
Transformers, central to the successes in modern Natural Language Processing, often falter
on arithmetic tasks despite their vast capabilities--which paradoxically include remarkable …

AdaMCT: adaptive mixture of CNN-transformer for sequential recommendation

J Jiang, P Zhang, Y Luo, C Li, JB Kim, K Zhang… - Proceedings of the …, 2023 - dl.acm.org
Sequential recommendation (SR) aims to model users' dynamic preferences from a series of
interactions. A pivotal challenge in user modeling for SR lies in the inherent variability of …

SMLP4Rec: An Efficient all-MLP Architecture for Sequential Recommendations

J Gao, X Zhao, M Li, M Zhao, R Wu, R Guo… - ACM Transactions on …, 2024 - dl.acm.org
Self-attention models have achieved the state-of-the-art performance in sequential
recommender systems by capturing the sequential dependencies among user–item …