[PDF][PDF] A survey of large language models

WX Zhao, K Zhou, J Li, T Tang… - arxiv preprint arxiv …, 2023 - paper-notes.zhjwpku.com
Ever since the Turing Test was proposed in the 1950s, humans have explored the mastering
of language intelligence by machine. Language is essentially a complex, intricate system of …

Yi: Open foundation models by 01. ai

A Young, B Chen, C Li, C Huang, G Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce the Yi model family, a series of language and multimodal models that
demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and …

You only cache once: Decoder-decoder architectures for language models

Y Sun, L Dong, Y Zhu, S Huang… - Advances in …, 2025 - proceedings.neurips.cc
We introduce a decoder-decoder architecture, YOCO, for large language models, which only
caches key-value pairs once. It consists of two components, ie, a cross-decoder stacked …

RULER: What's the Real Context Size of Your Long-Context Language Models?

CP Hsieh, S Sun, S Kriman, S Acharya… - arxiv preprint arxiv …, 2024 - arxiv.org
The needle-in-a-haystack (NIAH) test, which examines the ability to retrieve a piece of
information (the" needle") from long distractor texts (the" haystack"), has been widely …

Long-context llms struggle with long in-context learning

T Li, G Zhang, QD Do, X Yue, W Chen - arxiv preprint arxiv:2404.02060, 2024 - arxiv.org
Large Language Models (LLMs) have made significant strides in handling long sequences.
Some models like Gemini could even to be capable of dealing with millions of tokens …

Mlvu: A comprehensive benchmark for multi-task long video understanding

J Zhou, Y Shu, B Zhao, B Wu, S **ao, X Yang… - arxiv preprint arxiv …, 2024 - arxiv.org
The evaluation of Long Video Understanding (LVU) performance poses an important but
challenging research problem. Despite previous efforts, the existing video understanding …

[PDF][PDF] Leave no context behind: Efficient infinite context transformers with infini-attention

T Munkhdalai, M Faruqui… - arxiv preprint …, 2024 - storage.prod.researchhub.com
This work introduces an efficient method to scale Transformer-based Large Language
Models (LLMs) to infinitely long inputs with bounded memory and computation. A key …

Longvila: Scaling long-context visual language models for long videos

F Xue, Y Chen, D Li, Q Hu, L Zhu, X Li, Y Fang… - arxiv preprint arxiv …, 2024 - arxiv.org
Long-context capability is critical for multi-modal foundation models, especially for long
video understanding. We introduce LongVILA, a full-stack solution for long-context visual …

Minference 1.0: Accelerating pre-filling for long-context llms via dynamic sparse attention

H Jiang, Y Li, C Zhang, Q Wu, X Luo, S Ahn… - arxiv preprint arxiv …, 2024 - arxiv.org
The computational challenges of Large Language Model (LLM) inference remain a
significant barrier to their widespread deployment, especially as prompt lengths continue to …

Longalign: A recipe for long context alignment of large language models

Y Bai, X Lv, J Zhang, Y He, J Qi, L Hou, J Tang… - arxiv preprint arxiv …, 2024 - arxiv.org
Extending large language models to effectively handle long contexts requires instruction fine-
tuning on input sequences of similar length. To address this, we present LongAlign--a recipe …