Advancing transformer architecture in long-context large language models: A comprehensive survey

Y Huang, J Xu, J Lai, Z Jiang, T Chen, Z Li… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Transformer-based Large Language Models (LLMs) have been applied in diverse areas
such as knowledge bases, human interfaces, and dynamic agents, and marking a stride …

Gemini: a family of highly capable multimodal models

G Team, R Anil, S Borgeaud, JB Alayrac, J Yu… - arxiv preprint arxiv …, 2023‏ - arxiv.org
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable
capabilities across image, audio, video, and text understanding. The Gemini family consists …

Llama 2: Open foundation and fine-tuned chat models

H Touvron, L Martin, K Stone, P Albert… - arxiv preprint arxiv …, 2023‏ - arxiv.org
In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large
language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine …

Large language models for software engineering: Survey and open problems

A Fan, B Gokkaya, M Harman… - 2023 IEEE/ACM …, 2023‏ - ieeexplore.ieee.org
This paper provides a survey of the emerging area of Large Language Models (LLMs) for
Software Engineering (SE). It also sets out open research challenges for the application of …

Retentive network: A successor to transformer for large language models

Y Sun, L Dong, S Huang, S Ma, Y **a, J Xue… - arxiv preprint arxiv …, 2023‏ - arxiv.org
In this work, we propose Retentive Network (RetNet) as a foundation architecture for large
language models, simultaneously achieving training parallelism, low-cost inference, and …

Extending context window of large language models via positional interpolation

S Chen, S Wong, L Chen, Y Tian - arxiv preprint arxiv:2306.15595, 2023‏ - arxiv.org
We present Position Interpolation (PI) that extends the context window sizes of RoPE-based
pretrained LLMs such as LLaMA models to up to 32768 with minimal fine-tuning (within …

Flashattention-3: Fast and accurate attention with asynchrony and low-precision

J Shah, G Bikshandi, Y Zhang… - Advances in …, 2025‏ - proceedings.neurips.cc
Attention, as a core layer of the ubiquitous Transformer architecture, is the bottleneck for
large language models and long-context applications. elaborated an approach to speed up …

Longbench: A bilingual, multitask benchmark for long context understanding

Y Bai, X Lv, J Zhang, H Lyu, J Tang, Z Huang… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Although large language models (LLMs) demonstrate impressive performance for many
language tasks, most of them can only handle texts a few thousand tokens long, limiting their …

Diagonal state spaces are as effective as structured state spaces

A Gupta, A Gu, J Berant - Advances in Neural Information …, 2022‏ - proceedings.neurips.cc
Modeling long range dependencies in sequential data is a fundamental step towards
attaining human-level performance in many modalities such as text, vision, audio and video …

Same task, more tokens: the impact of input length on the reasoning performance of large language models

M Levy, A Jacoby, Y Goldberg - arxiv preprint arxiv:2402.14848, 2024‏ - arxiv.org
This paper explores the impact of extending input lengths on the capabilities of Large
Language Models (LLMs). Despite LLMs advancements in recent times, their performance …