- Academic Search

Y Huang, J Xu, J Lai, Z Jiang, T Chen, Z Li… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Transformer-based Large Language Models (LLMs) have been applied in diverse areas
such as knowledge bases, human interfaces, and dynamic agents, and marking a stride …‏

שמור צטט צוטט על ידי 39 מאמרים בנושא זה כל 2 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Gemini: a family of highly capable multimodal models‏

G Team, R Anil, S Borgeaud, JB Alayrac, J Yu… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

This report introduces a new family of multimodal models, Gemini, that exhibit remarkable
capabilities across image, audio, video, and text understanding. The Gemini family consists …‏

שמור צטט צוטט על ידי 3270 מאמרים בנושא זה כל 2 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Llama 2: Open foundation and fine-tuned chat models‏

H Touvron, L Martin, K Stone, P Albert… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large
language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine …‏

שמור צטט צוטט על ידי 12908 מאמרים בנושא זה כל 4 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Large language models for software engineering: Survey and open problems‏

A Fan, B Gokkaya, M Harman… - 2023 IEEE/ACM …, 2023‏ - ieeexplore.ieee.org‏

This paper provides a survey of the emerging area of Large Language Models (LLMs) for
Software Engineering (SE). It also sets out open research challenges for the application of …‏

שמור צטט צוטט על ידי 352 מאמרים בנושא זה כל 9 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Retentive network: A successor to transformer for large language models‏

Y Sun, L Dong, S Huang, S Ma, Y **a, J Xue… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

In this work, we propose Retentive Network (RetNet) as a foundation architecture for large
language models, simultaneously achieving training parallelism, low-cost inference, and …‏

שמור צטט צוטט על ידי 328 מאמרים בנושא זה כל 3 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Extending context window of large language models via positional interpolation‏

S Chen, S Wong, L Chen, Y Tian - arxiv preprint arxiv:2306.15595, 2023‏ - arxiv.org‏

We present Position Interpolation (PI) that extends the context window sizes of RoPE-based
pretrained LLMs such as LLaMA models to up to 32768 with minimal fine-tuning (within …‏

שמור צטט צוטט על ידי 382 מאמרים בנושא זה כל 2 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Flashattention-3: Fast and accurate attention with asynchrony and low-precision‏

J Shah, G Bikshandi, Y Zhang… - Advances in …, 2025‏ - proceedings.neurips.cc‏

Attention, as a core layer of the ubiquitous Transformer architecture, is the bottleneck for
large language models and long-context applications. elaborated an approach to speed up …‏

שמור צטט צוטט על ידי 66 מאמרים בנושא זה כל 5 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Longbench: A bilingual, multitask benchmark for long context understanding‏

Y Bai, X Lv, J Zhang, H Lyu, J Tang, Z Huang… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Although large language models (LLMs) demonstrate impressive performance for many
language tasks, most of them can only handle texts a few thousand tokens long, limiting their …‏

שמור צטט צוטט על ידי 176 מאמרים בנושא זה כל 5 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Diagonal state spaces are as effective as structured state spaces‏

A Gupta, A Gu, J Berant - Advances in Neural Information …, 2022‏ - proceedings.neurips.cc‏

Modeling long range dependencies in sequential data is a fundamental step towards
attaining human-level performance in many modalities such as text, vision, audio and video …‏

שמור צטט צוטט על ידי 308 מאמרים בנושא זה כל 7 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Same task, more tokens: the impact of input length on the reasoning performance of large language models‏

M Levy, A Jacoby, Y Goldberg - arxiv preprint arxiv:2402.14848, 2024‏ - arxiv.org‏

This paper explores the impact of extending input lengths on the capabilities of Large
Language Models (LLMs). Despite LLMs advancements in recent times, their performance …‏

שמור צטט צוטט על ידי 111 מאמרים בנושא זה כל 6 הגרסאות פתיחה בתור HTML

יצירת התראה

צטט

חיפוש מתקדם

נשמר בספרייה שלי

Scrolls: Standardized comparison over long language sequences

Advancing transformer architecture in long-context large language models: A comprehensive survey‏

Gemini: a family of highly capable multimodal models‏

Llama 2: Open foundation and fine-tuned chat models‏

Large language models for software engineering: Survey and open problems‏

Retentive network: A successor to transformer for large language models‏

Extending context window of large language models via positional interpolation‏

Flashattention-3: Fast and accurate attention with asynchrony and low-precision‏

Longbench: A bilingual, multitask benchmark for long context understanding‏

Diagonal state spaces are as effective as structured state spaces‏

Same task, more tokens: the impact of input length on the reasoning performance of large language models‏