- Academic Search

G Team, M Riviere, S Pathak, PG Sessa… - arxiv preprint arxiv …, 2024 - arxiv.org

In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-
of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new …

Speichern Zitieren Zitiert von: 352 Ähnliche Artikel Alle 4 Versionen Im Cache

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Sora: A review on background, technology, limitations, and opportunities of large vision models

Y Liu, K Zhang, Y Li, Z Yan, C Gao, R Chen… - arxiv preprint arxiv …, 2024 - arxiv.org

Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The
model is trained to generate videos of realistic or imaginative scenes from text instructions …

Speichern Zitieren Zitiert von: 228 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arxiv preprint arxiv …, 2023 - arxiv.org

Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

Speichern Zitieren Zitiert von: 3629 Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The llama 3 herd of models

A Dubey, A Jauhri, A Pandey, A Kadian… - arxiv preprint arxiv …, 2024 - arxiv.org

Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …

Speichern Zitieren Zitiert von: 2447 Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Qwen2. 5 technical report

A Yang, B Yang, B Zhang, B Hui, B Zheng, B Yu… - arxiv preprint arxiv …, 2024 - arxiv.org

In this report, we introduce Qwen2. 5, a comprehensive series of large language models
(LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has …

Speichern Zitieren Zitiert von: 867 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Yi: Open foundation models by 01. ai

A Young, B Chen, C Li, C Huang, G Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce the Yi model family, a series of language and multimodal models that
demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and …

Speichern Zitieren Zitiert von: 359 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] amax.com

Mistral 7B

AQ Jiang, A Sablayrolles, A Mensch, C Bamford… - arxiv preprint arxiv …, 2023 - arxiv.org

We introduce Mistral 7B v0. 1, a 7-billion-parameter language model engineered for
superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all …

Speichern Zitieren Zitiert von: 1255 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Flashattention-2: Faster attention with better parallelism and work partitioning

T Dao - arxiv preprint arxiv:2307.08691, 2023 - arxiv.org

Scaling Transformers to longer sequence lengths has been a major problem in the last
several years, promising to improve performance in language modeling and high-resolution …

Speichern Zitieren Zitiert von: 762 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Flashattention-3: Fast and accurate attention with asynchrony and low-precision

J Shah, G Bikshandi, Y Zhang… - Advances in …, 2025 - proceedings.neurips.cc

Attention, as a core layer of the ubiquitous Transformer architecture, is the bottleneck for
large language models and long-context applications. elaborated an approach to speed up …

Speichern Zitieren Zitiert von: 50 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] biorxiv.org

Sequence modeling and design from molecular to genome scale with Evo

E Nguyen, M Poli, MG Durrant, B Kang, D Katrekar… - Science, 2024 - science.org

The genome is a sequence that encodes the DNA, RNA, and proteins that orchestrate an
organism's function. We present Evo, a long-context genomic foundation model with a …

Speichern Zitieren Zitiert von: 100 Ähnliche Artikel Alle 3 Versionen

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Gqa: Training generalized multi-query transformer models from multi-head checkpoints

Gemma 2: Improving open language models at a practical size

Sora: A review on background, technology, limitations, and opportunities of large vision models

A survey of large language models

The llama 3 herd of models

Qwen2. 5 technical report

Yi: Open foundation models by 01. ai

Mistral 7B

Flashattention-2: Faster attention with better parallelism and work partitioning

Flashattention-3: Fast and accurate attention with asynchrony and low-precision

Sequence modeling and design from molecular to genome scale with Evo