Gemma 2: Improving open language models at a practical size
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-
of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new …
of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new …
Sora: A review on background, technology, limitations, and opportunities of large vision models
Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The
model is trained to generate videos of realistic or imaginative scenes from text instructions …
model is trained to generate videos of realistic or imaginative scenes from text instructions …
A survey of large language models
Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …
The llama 3 herd of models
Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …
presents a new set of foundation models, called Llama 3. It is a herd of language models …
Qwen2. 5 technical report
In this report, we introduce Qwen2. 5, a comprehensive series of large language models
(LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has …
(LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has …
Yi: Open foundation models by 01. ai
We introduce the Yi model family, a series of language and multimodal models that
demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and …
demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and …
Mistral 7B
We introduce Mistral 7B v0. 1, a 7-billion-parameter language model engineered for
superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all …
superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all …
Flashattention-2: Faster attention with better parallelism and work partitioning
T Dao - arxiv preprint arxiv:2307.08691, 2023 - arxiv.org
Scaling Transformers to longer sequence lengths has been a major problem in the last
several years, promising to improve performance in language modeling and high-resolution …
several years, promising to improve performance in language modeling and high-resolution …
Flashattention-3: Fast and accurate attention with asynchrony and low-precision
J Shah, G Bikshandi, Y Zhang… - Advances in …, 2025 - proceedings.neurips.cc
Attention, as a core layer of the ubiquitous Transformer architecture, is the bottleneck for
large language models and long-context applications. elaborated an approach to speed up …
large language models and long-context applications. elaborated an approach to speed up …
Sequence modeling and design from molecular to genome scale with Evo
The genome is a sequence that encodes the DNA, RNA, and proteins that orchestrate an
organism's function. We present Evo, a long-context genomic foundation model with a …
organism's function. We present Evo, a long-context genomic foundation model with a …